[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[altq 1549] Re: HFSC-ALTQ rtsc_min() ...mmm... peculiarity
On Tue, 25 Jun 2002, Kenjiro Cho wrote:
> > #define SM_MASK (0xffffffffffffffffLL << SM_SHIFT)
> This should be
> #define SM_MASK ((1LL << SM_SHIFT) - 1)
Sure, you are right. I got it inverted :)
> I think it is better to always do 2 multiplications.
> How about this one?
> static __inline u_int64_t
> seg_x2y(x, sm)
> u_int64_t x;
> u_int64_t sm;
> u_int64_t y;
> y = x >> SM_SHIFT;
> if (y != 0)
> y *= sm;
> y += ((x & SM_MASK) * sm) >> SM_SHIFT;
> return (y);
Hmm... It will certainly work. However it will do those 2 multiplications
almost always. I mean that (y != 0) will be true for all x >= 2^24. If we
interpret x as real time this amounts to 16M clock ticks or about 0.02 sec
for 800 MHz CPU, i.e. after roughly first 0.02 sec of baklogged period we
are forced to do 2 multiplications. Since this function is called rather
often (in fact so often, that you bothered to suggest comipler to __inline
it), we'd better minimize it as much as possible, shouldn't we?
Well, I admit that I didn't look into modern CPU manuals lately to see how
many CPU clocks the multiplications operation is worth. Probably it is
negligible. But from the most general standpoint, why do more if we can
get away with less?
Even without modifying runtime_sc we can put things like this:
#define SC_X_LARGEVAL (1LL << 40)
#define SM_MASK ((1LL << SM_SHIFT) - 1)
static __inline u_int64_t
if (x < SC_X_LARGEVAL)
return x * sm >> SM_SHIFT;
return ((x >> SM_SHIFT) * sm) +
(((x & SM_MASK) * sm) >> SM_SHIFT);
This is actually almost your old seg_x2y() function. It will work
correctly for sm <= 2^24, which is always so for all reasonable
configurations (as I have shown in my previous message). And it will
perform only one multiplication and one shift for all x < 2^40, i.e.
~22 min for 800MHz CPU which is much better than 0.02 sec.
It is probably even more important for seg_y2x() function that is called
more often (by update_...() functions). Your version will do 2
multiplications after the first 2^10 bytes which amounts roughly to one
packet. The version similar to the seg_x2y() above with SC_Y_LARGEVAL
defined as (1LL << 32) will perform one multiplication until y reaches
4GB which is certainly happens rather less often. ;)
Now if we don't mind those few additional values in runtime_sc we could
calculate individually optimal x- & y-limits for particular curve and
probably even "personalized" amount of shift bits to maximize accuracy
and minimize multiplications. This is done only once, when curve is
installed, so we can afford almost arbitrarily complex precalculations.
These were "pros". The "cons" include: 1) certain increase in the size of
runtime_sc (to me, this is affordable), 2) functions seg_x2y() and
seg_y2x() get one or two extra parameters (should do no harm, since the
functions are inlined anyway), 3) comparison to a variable instead of a
constant and shift to variable amount of bits instead of a constant might
be less effective because of no compiler optimization or something alike
(no comments, since I didn't look at the actual compiler-generated code
Anyway, the above is just MHO.