[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[altq 1566] Next round of HFSC-ALTQ things...



Hi Kenjiro,

On Thu, 27 Jun 2002, Kenjiro Cho wrote:

> Oleg Cherevko wrote:
> > Now if we don't mind those few additional values in runtime_sc we could
> > calculate individually optimal x- & y-limits for particular curve and
> > probably even "personalized" amount of shift bits to maximize accuracy
> > and minimize multiplications. This is done only once, when curve is
> > installed, so we can afford almost arbitrarily complex precalculations.
> > These were "pros". The "cons" include: 1) certain increase in the size of
> > runtime_sc (to me, this is affordable), 2) functions seg_x2y() and 
> > seg_y2x() get one or two extra parameters (should do no harm, since the
> > functions are inlined anyway), 3) comparison to a variable instead of a
> > constant and shift to variable amount of bits instead of a constant might
> > be less effective because of no compiler optimization or something alike 
> > (no comments, since I didn't look at the actual compiler-generated code
> > yet).
> > 
> > Anyway, the above is just MHO.
> 
> I was thinking more about readability than efficiency.
> 
> I have CMU's userland simulator that I used to optimize the hfsc code
> with gprof(1).  I don't remember the details of this simulator but, if
> you are interested, I have the source code.

Thanks, but to me it seems a bit early to do explicit optimizations with
profiler right now. I'd rather release something working first. :)

> Here's part of the gprof output.
[...skipped...]

Thanks, I found it to be interesting.

----------------------------------------------------------------------
Now to the new stuff.
First, the update on the seq_y2x() & Co. After installing correct patched
versions of seq_y2x() and seq_x2y() it turned out that things did improve
for rtsc_min() but only a bit. I still got the very same type of error
with this data, for example:
cl_virtual: {
        x = 0
        y = 0x508
        sm1 = 0
        ism1 = 0xffffffffffffffff (i.e. SC_INFINITY)
        dx = 0
        dy = 0
        sm2 = 0xb26
        ism2 = 0x5bd450
}
machclk_freq this time was 734634842.
The rtsc_min() operation:
        rtsc_min(&cl->cl_virtual, cl->cl_fsc, x, y);
where
        x = 0xaeef30d85
        y = 0x79ec46
Additional values:
        y1 = rtsc_x2y(&cl->cl_virtual, x) = 0x79e8f1
        x1 = rtsc_y2x(&cl->cl_virtual, y) = 0xaee932866
Once again, x > x1, so the new curve (linear) lies to the right,
i.e. is smaller, and y1 should be greater than y, but instead y > y1.

With this it became clear to me that the problem is more fundamental
and is caused by rtsc_x2y() and rtsc_y2x() inacurracies that arise because
of sm and ism rounding. In order to improve this I redesigned the RTSC
subsystem of the HFSC-ALTQ.

This redesign is basically a combination of the two major changes:

1. The rtsc_x2y() is gone. Completely gone.
   As you may see, the rtsc_x2y() function is used by rtsc_min() only,
   while rtsc_y2x() is used elsewhere. Now, one can observe that because
   our service curves are nondecreasing, instead of doing min() operation
   with respect to y we can do max() operation with respect to x.
   In plain English: the curve that is minimum of the two argument curves
   necessarily is the "rightmost" of them and vice versa.
   So I rewrote rtsc_min() operation in terms of calculation of
   the "rightmost" curve. This calculation requires only rtsc_y2x()
   operation (with a minor exception in the case concave curves
   intersect). The benefits of this unification are: 
   1) since only one curve-mapping function (rtsc_y2x) is used by all
      parts of the system its results are consistent through all the
      system, i.e. there is no way to get something like 
      rtsc_y2x(rtsc_x2y(x)) != x, that is pretty possible with current
      two-function RTSC subsystem;
   2) in general, y (bytes) is much more coarse-grained than x (CPU clock
      ticks), this means that if y1 != y2 then rtsc_y2x(y1) !=
      rtsc_y2x(y2), while it is pretty possible that for sufficiently
      close x1 != x2 we would get rtsc_x2y(x1) == rtsc_x2y(x2) because of
      rounding errors;
   3) since rtsc_x2y() is gone, so are the sm1 and sm2, only their inverse
      counterparts are kept in the struct runtime_sc.

2. The second major change deals with how the ism values are stored in the
   struct runtime_sc. You used 64-bit fixed-point format with 54-bit
   integral part and 10-bit fractional part. I switched to the sort of
   floating point instead (custom-made and integer-based, no floats, no
   doubles). Now ism value is stored as two integer values: an u_int64_t 
   "ism" that serves as mantissa and u_int "ism_shift" that keeps sort of
   exponent (actually, number of bit that ism was left-shifted, and the
   result of multiplication to ism should be right-shifted). The "ism"
   mantissa is always kept with 32-bit accuracy which gives us 2^(-32)
   or ~2.5e-10 error. This basically means that we can specify ~1Gbps m
   values in curves with ~1bps accuracy. What is more important that this
   1bps accuracy is available along the whole 1bps-1Gbps range.
   The variable amount of shift bits translates in seg_y2x() to only 4
   additional assembler commands (8-bit "test", "jz", 32-bit "mov", and
   32-bit "xor") which is pretty affordable. BTW, "ism" can be compacted
   to u_int32_t and for "ism_shift" u_int8_t is more than enough.

I kept the old RTSC subsystem in the code and made the redesigned RTSC
stuff depend on the HFSC_NEW_RTSC_STUFF define that is turned off by
default for pure HFSC-ALTQ but is required for my HAFSC extensions.

----------------------------------------------------------------------
Now to something different.
I have found rather annoying bug/misfeature/whatever in the
.../libaltq/qop_hfsc.c code. This piece of code is sometimes causing
nasty troubles:
        /*
         * if the link-sharing service curve is diffrent from
         * the real-time service curve, we first create a class with the
         * smaller service curve and then modify the other service curve.
         */
        if (rm2 <= fm2) {
                m1 = rm1; d = rd; m2 = rm2;
        } else {
                m1 = fm1; d = fd; m2 = fm2;
        }
        error = qcmd_hfsc_add_class(ifname, class_name, parent_name,
                                    m1, d, m2, qlimit, flags);

        if (error == 0 && (rm1 != fm1 || rd != fd || rm2 != fm2)) {
                if (rm2 <= fm2) {
                        m1 = fm1; d = fd; m2 = fm2; type = HFSC_LINKSHARINGSC;
                } else {
                        m1 = rm1; d = rd; m2 = rm2; type = HFSC_REALTIMESC;
                }
                error = qcmd_hfsc_modify_class(ifname, class_name,
                                               m1, d, m2, type);
        }

In this code the (rough) minimum of real-time and fair/link-sharing curves
is determined by effectively comparing them on the infinity (thus only rm2
and fm2 values matter), then installing this minimum for both curves, then
modifying the class to install the actual real-time or fair/link-sharing
curve. Unfortunately sometimes this simply does not work at all.
Consider the following example:

  class hfsc <interface> parent grandparent   grate 64K  [ls 0 0 64K]
  class hfsc <interface> child1 parent [rt   0 1000 48K] [ls 0 0 32K]
  class hfsc <interface> child2 parent [rt 64K 1000 16K] [ls 0 0 32K]

It is obvious that all the admission constraints are satisfied. But if you
try to add child2 with qcmd_hfsc_add_class() by the above code it will
try to add it initially with both curves set to [64K 1000 16K] that
violates fair/link-sharing admission constraints for the first 1000ms.
It is interesting that trying to add child2 with both curves set to 
[0 0 32K] would violate the real-time admission constraints after those
initial 1000ms. Deadlock? Seems so. And the error message
"hfsc_class_parser: admission failure (no bandwidth)" can pretty well
turn the user insane trying to figure out what is wrong with his/her
calculator. :) :(

One possible (ugly) solution would be to add new class with both curves
zeroed, then modify them by two calls to qcmd_hfsc_modify_class().
Otherwise it seems that a new interface call for kernel part of HFSC-ALTQ
is required that would allow both curves to be specified independently and
at once.

----------------------------------------------------------------------
The last thing.
I'd suggest to add one more check to validate_sc() from
.../libaltq/qop_hfsc.c to ensure that in concave curves m2 != 0, like
this:
...
	if (sc->m1 > sc->m2 && sc->m2 == 0) {
		LOG(LOG_ERR, 0, "m2 must be nonzero for concave!");
		return (-1);
	}
...
I really hope nobody in their right mind would ever try to specify
concave curve with m2=0 (because what such a curve actually means is that
after single initial burst of m1*d bytes there would be no more traffic
allowed at all) but who knows? :)

-- 
Olwi