Does anyone really understand how HFSC scheduling in Linux/BSD works?
Posted
by Mecki
on Server Fault
See other posts from Server Fault
or by Mecki
Published on 2010-01-21T15:01:17Z
Indexed on
2010/05/23
18:32 UTC
Read the original article
Hit count: 520
I read the original SIGCOMM '97 PostScript paper about HFSC, it is very technically, but I understand the basic concept. Instead of giving a linear service curve (as with pretty much every other scheduling algorithm), you can specify a convex or concave service curve and thus it is possible to decouple bandwidth and delay. However, even though this paper mentions to kind of scheduling algorithms being used (real-time and link-share), it always only mentions ONE curve per scheduling class (the decoupling is done by specifying this curve, only one curve is needed for that).
Now HFSC has been implemented for BSD (OpenBSD, FreeBSD, etc.) using the ALTQ scheduling framework and it has been implemented Linux using the TC scheduling framework (part of iproute2). Both implementations added two additional service curves, that were NOT in the original paper! A real-time service curve and an upper-limit service curve. Again, please note that the original paper mentions two scheduling algorithms (real-time and link-share), but in that paper both work with one single service curve. There never have been two independent service curves for either one as you currently find in BSD and Linux.
Even worse, some version of ALTQ seems to add an additional queue priority to HSFC (there is no such thing as priority in the original paper either). I found several BSD HowTo's mentioning this priority setting (even though the man page of the latest ALTQ release knows no such parameter for HSFC, so officially it does not even exist).
This all makes the HFSC scheduling even more complex than the algorithm described in the original paper and there are tons of tutorials on the Internet that often contradict each other, one claiming the opposite of the other one. This is probably the main reason why nobody really seems to understand how HFSC scheduling really works. Before I can ask my questions, we need a sample setup of some kind. I'll use a very simple one as seen in the image below:
Here are some questions I cannot answer because the tutorials contradict each other:
What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average)
Does the real-time bandwidth count towards the link-share bandwidth? E.g. if A1 and B1 both only have 64kbit/s real-time and 64kbit/s link-share bandwidth, does that mean once they are served 64kbit/s via real-time, their link-share requirement is satisfied as well (they might get excess bandwidth, but lets ignore that for a second) or does that mean they get another 64 kbit/s via link-share? So does each class has a bandwidth "requirement" of real-time plus link-share? Or does a class only have a higher requirement than the real-time curve if the link-share curve is higher than the real-time curve (current link-share requirement equals specified link-share requirement minus real-time bandwidth already provided to this class)?
Is upper limit curve applied to real-time as well, only to link-share, or maybe to both? Some tutorials say one way, some say the other way. Some even claim upper-limit is the maximum for real-time bandwidth + link-share bandwidth? What is the truth?
Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference?
If I use the seperate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic?
According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration).
Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong?
One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC?
And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope?
I still haven't lost hope that there exists at least a hand full of people in this world that really understood HFSC and are able to answer all these questions accurately. And doing so without contradicting each other in the answers would be really nice ;-)
© Server Fault or respective owner