Search Results

Search found 12 results on 1 pages for 'quantization'.

Page 1/1 | 1 

  • 16 millisecond quantization when sending/receivingtcp packets

    - by MKZ
    Hi, I have a C++ application running on windows xp 32 system sending and receiving short tcp/ip packets. Measuring (accurately) the arrival time I see a quantization of the arrival time to 16 millisecond time units. (Meaning all packets arriving are at (16 )xN milliseconds separated from each other) To avoid packet aggregation I tried to disable the NAGLE algorithm by setting the IPPROTO_TCP option to TCP_NODELAY in the socket variables but it did not help I suspect that the problem is related to the windows schedular which also have a 16 millisecond clock.. Any idea of a solution to this problem ? Thanks

    Read the article

  • Color banding only on Android 4.0+

    - by threeshinyapples
    On emulators running Android 4.0 or 4.0.3, I am seeing horrible colour banding which I can't seem to get rid of. On every other Android version I have tested, gradients look smooth. I have a SurfaceView which is configured as RGBX_8888, and the banding is not present in the rendered canvas. If I manually dither the image by overlaying a noise pattern at the end of rendering I can make the gradients smooth again, though obviously at a cost to performance which I'd rather avoid. So the banding is being introduced later. I can only assume that, on 4.0+, my SurfaceView is being quantized to a lower bit-depth at some point between it being drawn and being displayed, and I can see from a screen capture that gradients are stepping 8 values at a time in each channel, suggesting a quantization to 555 (not 565). I added the following to my Activity onCreate function, but it made no difference. getWindow().setFormat(PixelFormat.RGBA_8888); getWindow().addFlags(WindowManager.LayoutParams.FLAG_DITHER); I also tried putting the above in onAttachedToWindow() instead, but there was still no change. (I believe that RGBA_8888 is the default window format anyway for 2.2 and above, so it's little surprise that explicitly setting that format has no effect on 4.0+.) Which leaves the question, if the source is 8888 and the destination is 8888, what is introducing the quantization/banding and why does it only appear on 4.0+? Very puzzling. I wonder if anyone can shed some light?

    Read the article

  • Is the Leptonica implementation of 'Modified Median Cut' not using the median at all?

    - by TheCodeJunkie
    I'm playing around a bit with image processing and decided to read up on how color quantization worked and after a bit of reading I found the Modified Median Cut Quantization algorithm. I've been reading the code of the C implementation in Leptonica library and came across something I thought was a bit odd. Now I want to stress that I am far from an expert in this area, not am I a math-head, so I am predicting that this all comes down to me not understanding all of it and not that the implementation of the algorithm is wrong at all. The algorithm states that the vbox should be split along the lagest axis and that it should be split using the following logic The largest axis is divided by locating the bin with the median pixel (by population), selecting the longer side, and dividing in the center of that side. We could have simply put the bin with the median pixel in the shorter side, but in the early stages of subdivision, this tends to put low density clusters (that are not considered in the subdivision) in the same vbox as part of a high density cluster that will outvote it in median vbox color, even with future median-based subdivisions. The algorithm used here is particularly important in early subdivisions, and 3is useful for giving visible but low population color clusters their own vbox. This has little effect on the subdivision of high density clusters, which ultimately will have roughly equal population in their vboxes. For the sake of the argument, let's assume that we have a vbox that we are in the process of splitting and that the red axis is the largest. In the Leptonica algorithm, on line 01297, the code appears to do the following Iterate over all the possible green and blue variations of the red color For each iteration it adds to the total number of pixels (population) it's found along the red axis For each red color it sum up the population of the current red and the previous ones, thus storing an accumulated value, for each red note: when I say 'red' I mean each point along the axis that is covered by the iteration, the actual color may not be red but contains a certain amount of red So for the sake of illustration, assume we have 9 "bins" along the red axis and that they have the following populations 4 8 20 16 1 9 12 8 8 After the iteration of all red bins, the partialsum array will contain the following count for the bins mentioned above 4 12 32 48 49 58 70 78 86 And total would have a value of 86 Once that's done it's time to perform the actual median cut and for the red axis this is performed on line 01346 It iterates over bins and check they accumulated sum. And here's the part that throws me of from the description of the algorithm. It looks for the first bin that has a value that is greater than total/2 Wouldn't total/2 mean that it is looking for a bin that has a value that is greater than the average value and not the median ? The median for the above bins would be 49 The use of 43 or 49 could potentially have a huge impact on how the boxes are split, even though the algorithm then proceeds by moving to the center of the larger side of where the matched value was.. Another thing that puzzles me a bit is that the paper specified that the bin with the median value should be located, but does not mention how to proceed if there are an even number of bins.. the median would be the result of (a+b)/2 and it's not guaranteed that any of the bins contains that population count. So this is what makes me thing that there are some approximations going on that are negligible because of how the split actually takes part at the center of the larger side of the selected bin. Sorry if it got a bit long winded, but I wanted to be as thoroughas I could because it's been driving me nuts for a couple of days now ;)

    Read the article

  • GDC 2012: DXT is NOT ENOUGH! Advanced texture compression for games

    GDC 2012: DXT is NOT ENOUGH! Advanced texture compression for games (Pre-recorded GDC content) Tired of fighting to fit your textures on disk? Too many bad reviews on long download times? Fix it! Don't settle for putting your raw DXT files in a ZIP, instead, compress your DXT textures by an extra 50%-70%! This talk will cover various ways to increase the compression of your game textures to allow for smaller distributables without introducing error, and allowing for fast on-demand decompression at run time. We'll cover how to losslessly squeeze your data with Huffman, block expansion, vector quantization, and we'll even take a look at what MegaTexture is doing too. If you've ever fought to fit textures into memory, this is the talk for you. Speaker: Colt McAnlis From: GoogleDevelopers Views: 1132 21 ratings Time: 33:05 More in Science & Technology

    Read the article

  • Find the closest vector

    - by Alexey Lebedev
    Hello! Recently I wrote the algorithm to quantize an RGB image. Every pixel is represented by an (R,G,B) vector, and quantization codebook is a couple of 3-dimensional vectors. Every pixel of the image needs to be mapped to (say, "replaced by") the codebook pixel closest in terms of euclidean distance (more exactly, squared euclidean). I did it as follows: class EuclideanMetric(DistanceMetric): def __call__(self, x, y): d = x - y return sqrt(sum(d * d, -1)) class Quantizer(object): def __init__(self, codebook, distanceMetric = EuclideanMetric()): self._codebook = codebook self._distMetric = distanceMetric def quantize(self, imageArray): quantizedRaster = zeros(imageArray.shape) X = quantizedRaster.shape[0] Y = quantizedRaster.shape[1] for i in xrange(0, X): print i for j in xrange(0, Y): dist = self._distMetric(imageArray[i,j], self._codebook) code = argmin(dist) quantizedRaster[i,j] = self._codebook[code] return quantizedRaster ...and it works awfully, almost 800 seconds on my Pentium Core Duo 2.2 GHz, 4 Gigs of memory and an image of 2600*2700 pixels:( Is there a way to somewhat optimize this? Maybe the other algorithm or some Python-specific optimizations.

    Read the article

  • How to perform spatial partitioning in n-dimensions?

    - by kevin42
    I'm trying to design an implementation of Vector Quantization as a c++ template class that can handle different types and dimensions of vectors (e.g. 16 dimension vectors of bytes, or 4d vectors of doubles, etc). I've been reading up on the algorithms, and I understand most of it: here and here I want to implement the Linde-Buzo-Gray (LBG) Algorithm, but I'm having difficulty figuring out the general algorithm for partitioning the clusters. I think I need to define a plane (hyperplane?) that splits the vectors in a cluster so there is an equal number on each side of the plane. [edit to add more info] This is an iterative process, but I think I start by finding the centroid of all the vectors, then use that centroid to define the splitting plane, get the centroid of each of the sides of the plane, continuing until I have the number of clusters needed for the VQ algorithm (iterating to optimize for less distortion along the way). The animation in the first link above shows it nicely. My questions are: What is an algorithm to find the plane once I have the centroid? How can I test a vector to see if it is on either side of that plane?

    Read the article

  • Extracting DCT coefficients from encoded images and video

    - by misha
    Is there a way to easily extract the DCT coefficients (and quantization parameters) from encoded images and video? Any decoder software must be using them to decode block-DCT encoded images and video. So I'm pretty sure the decoder knows what they are. Is there a way to expose them to whomever is using the decoder? I'm implementing some video quality assessment algorithms that work directly in the DCT domain. Currently, the majority of my code uses OpenCV, so it would be great if anyone knows of a solution using that framework. I don't mind using other libraries (perhaps libjpeg, but that seems to be for still images only), but my primary concern is to do as little format-specific work as possible (I don't want to reinvent the wheel and write my own decoders). I want to be able to open any video/image (H.264, MPEG, JPEG, etc) that OpenCV can open, and if it's block DCT-encoded, to get the DCT coefficients. In the worst case, I know that I can write up my own block DCT code, run the decompressed frames/images through it and then I'd be back in the DCT domain. That's hardly an elegant solution, and I hope I can do better. Presently, I use the fairly common OpenCV boilerplate to open images: IplImage *image = cvLoadImage(filename); // Run quality assessment metric The code I'm using for video is equally trivial: CvCapture *capture = cvCaptureFromAVI(filename); while (cvGrabFrame(capture)) { IplImage *frame = cvRetrieveFrame(capture); // Run quality assessment metric on frame } cvReleaseCapture(&capture); In both cases, I get a 3-channel IplImage in BGR format. Is there any way I can get the DCT coefficients as well?

    Read the article

  • Steganography Experiment - Trouble hiding message bits in DCT coefficients

    - by JohnHankinson
    I have an application requiring me to be able to embed loss-less data into an image. As such I've been experimenting with steganography, specifically via modification of DCT coefficients as the method I select, apart from being loss-less must also be relatively resilient against format conversion, scaling/DSP etc. From the research I've done thus far this method seems to be the best candidate. I've seen a number of papers on the subject which all seem to neglect specific details (some neglect to mention modification of 0 coefficients, or modification of AC coefficient etc). After combining the findings and making a few modifications of my own which include: 1) Using a more quantized version of the DCT matrix to ensure we only modify coefficients that would still be present should the image be JPEG'ed further or processed (I'm using this in place of simply following a zig-zag pattern). 2) I'm modifying bit 4 instead of the LSB and then based on what the original bit value was adjusting the lower bits to minimize the difference. 3) I'm only modifying the blue channel as it should be the least visible. This process must modify the actual image and not the DCT values stored in file (like jsteg) as there is no guarantee the file will be a JPEG, it may also be opened and re-saved at a later stage in a different format. For added robustness I've included the message multiple times and use the bits that occur most often, I had considered using a QR code as the message data or simply applying the reed-solomon error correction, but for this simple application and given that the "message" in question is usually going to be between 10-32 bytes I have plenty of room to repeat it which should provide sufficient redundancy to recover the true bits. No matter what I do I don't seem to be able to recover the bits at the decode stage. I've tried including / excluding various checks (even if it degrades image quality for the time being). I've tried using fixed point vs. double arithmetic, moving the bit to encode, I suspect that the message bits are being lost during the IDCT back to image. Any thoughts or suggestions on how to get this working would be hugely appreciated. (PS I am aware that the actual DCT/IDCT could be optimized from it's naive On4 operation using row column algorithm, or an FDCT like AAN, but for now it just needs to work :) ) Reference Papers: http://www.lokminglui.com/dct.pdf http://arxiv.org/ftp/arxiv/papers/1006/1006.1186.pdf Code for the Encode/Decode process in C# below: using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Drawing.Imaging; using System.Drawing; namespace ImageKey { public class Encoder { public const int HIDE_BIT_POS = 3; // use bit position 4 (1 << 3). public const int HIDE_COUNT = 16; // Number of times to repeat the message to avoid error. // JPEG Standard Quantization Matrix. // (to get higher quality multiply by (100-quality)/50 .. // for lower than 50 multiply by 50/quality. Then round to integers and clip to ensure only positive integers. public static double[] Q = {16,11,10,16,24,40,51,61, 12,12,14,19,26,58,60,55, 14,13,16,24,40,57,69,56, 14,17,22,29,51,87,80,62, 18,22,37,56,68,109,103,77, 24,35,55,64,81,104,113,92, 49,64,78,87,103,121,120,101, 72,92,95,98,112,100,103,99}; // Maximum qauality quantization matrix (if all 1's doesn't modify coefficients at all). public static double[] Q2 = {1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1}; public static Bitmap Encode(Bitmap b, string key) { Bitmap response = new Bitmap(b.Width, b.Height, PixelFormat.Format32bppArgb); uint imgWidth = ((uint)b.Width) & ~((uint)7); // Maximum usable X resolution (divisible by 8). uint imgHeight = ((uint)b.Height) & ~((uint)7); // Maximum usable Y resolution (divisible by 8). // Start be transferring the unmodified image portions. // As we'll be using slightly less width/height for the encoding process we'll need the edges to be populated. for (int y = 0; y < b.Height; y++) for (int x = 0; x < b.Width; x++) { if( (x >= imgWidth && x < b.Width) || (y>=imgHeight && y < b.Height)) response.SetPixel(x, y, b.GetPixel(x, y)); } // Setup the counters and byte data for the message to encode. StringBuilder sb = new StringBuilder(); for(int i=0;i<HIDE_COUNT;i++) sb.Append(key); byte[] codeBytes = System.Text.Encoding.ASCII.GetBytes(sb.ToString()); int bitofs = 0; // Current bit position we've encoded too. int totalBits = (codeBytes.Length * 8); // Total number of bits to encode. for (int y = 0; y < imgHeight; y += 8) { for (int x = 0; x < imgWidth; x += 8) { int[] redData = GetRedChannelData(b, x, y); int[] greenData = GetGreenChannelData(b, x, y); int[] blueData = GetBlueChannelData(b, x, y); int[] newRedData; int[] newGreenData; int[] newBlueData; if (bitofs < totalBits) { double[] redDCT = DCT(ref redData); double[] greenDCT = DCT(ref greenData); double[] blueDCT = DCT(ref blueData); int[] redDCTI = Quantize(ref redDCT, ref Q2); int[] greenDCTI = Quantize(ref greenDCT, ref Q2); int[] blueDCTI = Quantize(ref blueDCT, ref Q2); int[] blueDCTC = Quantize(ref blueDCT, ref Q); HideBits(ref blueDCTI, ref blueDCTC, ref bitofs, ref totalBits, ref codeBytes); double[] redDCT2 = DeQuantize(ref redDCTI, ref Q2); double[] greenDCT2 = DeQuantize(ref greenDCTI, ref Q2); double[] blueDCT2 = DeQuantize(ref blueDCTI, ref Q2); newRedData = IDCT(ref redDCT2); newGreenData = IDCT(ref greenDCT2); newBlueData = IDCT(ref blueDCT2); } else { newRedData = redData; newGreenData = greenData; newBlueData = blueData; } MapToRGBRange(ref newRedData); MapToRGBRange(ref newGreenData); MapToRGBRange(ref newBlueData); for(int dy=0;dy<8;dy++) { for(int dx=0;dx<8;dx++) { int col = (0xff<<24) + (newRedData[dx+(dy*8)]<<16) + (newGreenData[dx+(dy*8)]<<8) + (newBlueData[dx+(dy*8)]); response.SetPixel(x+dx,y+dy,Color.FromArgb(col)); } } } } if (bitofs < totalBits) throw new Exception("Failed to encode data - insufficient cover image coefficients"); return (response); } public static void HideBits(ref int[] DCTMatrix, ref int[] CMatrix, ref int bitofs, ref int totalBits, ref byte[] codeBytes) { int tempValue = 0; for (int u = 0; u < 8; u++) { for (int v = 0; v < 8; v++) { if ( (u != 0 || v != 0) && CMatrix[v+(u*8)] != 0 && DCTMatrix[v+(u*8)] != 0) { if (bitofs < totalBits) { tempValue = DCTMatrix[v + (u * 8)]; int bytePos = (bitofs) >> 3; int bitPos = (bitofs) % 8; byte mask = (byte)(1 << bitPos); byte value = (byte)((codeBytes[bytePos] & mask) >> bitPos); // 0 or 1. if (value == 0) { int a = DCTMatrix[v + (u * 8)] & (1 << HIDE_BIT_POS); if (a != 0) DCTMatrix[v + (u * 8)] |= (1 << HIDE_BIT_POS) - 1; DCTMatrix[v + (u * 8)] &= ~(1 << HIDE_BIT_POS); } else if (value == 1) { int a = DCTMatrix[v + (u * 8)] & (1 << HIDE_BIT_POS); if (a == 0) DCTMatrix[v + (u * 8)] &= ~((1 << HIDE_BIT_POS) - 1); DCTMatrix[v + (u * 8)] |= (1 << HIDE_BIT_POS); } if (DCTMatrix[v + (u * 8)] != 0) bitofs++; else DCTMatrix[v + (u * 8)] = tempValue; } } } } } public static void MapToRGBRange(ref int[] data) { for(int i=0;i<data.Length;i++) { data[i] += 128; if(data[i] < 0) data[i] = 0; else if(data[i] > 255) data[i] = 255; } } public static int[] GetRedChannelData(Bitmap b, int sx, int sy) { int[] data = new int[8 * 8]; for (int y = sy; y < (sy + 8); y++) { for (int x = sx; x < (sx + 8); x++) { uint col = (uint)b.GetPixel(x,y).ToArgb(); data[(x - sx) + ((y - sy) * 8)] = (int)((col >> 16) & 0xff) - 128; } } return (data); } public static int[] GetGreenChannelData(Bitmap b, int sx, int sy) { int[] data = new int[8 * 8]; for (int y = sy; y < (sy + 8); y++) { for (int x = sx; x < (sx + 8); x++) { uint col = (uint)b.GetPixel(x, y).ToArgb(); data[(x - sx) + ((y - sy) * 8)] = (int)((col >> 8) & 0xff) - 128; } } return (data); } public static int[] GetBlueChannelData(Bitmap b, int sx, int sy) { int[] data = new int[8 * 8]; for (int y = sy; y < (sy + 8); y++) { for (int x = sx; x < (sx + 8); x++) { uint col = (uint)b.GetPixel(x, y).ToArgb(); data[(x - sx) + ((y - sy) * 8)] = (int)((col >> 0) & 0xff) - 128; } } return (data); } public static int[] Quantize(ref double[] DCTMatrix, ref double[] Q) { int[] DCTMatrixOut = new int[8*8]; for (int u = 0; u < 8; u++) { for (int v = 0; v < 8; v++) { DCTMatrixOut[v + (u * 8)] = (int)Math.Round(DCTMatrix[v + (u * 8)] / Q[v + (u * 8)]); } } return(DCTMatrixOut); } public static double[] DeQuantize(ref int[] DCTMatrix, ref double[] Q) { double[] DCTMatrixOut = new double[8*8]; for (int u = 0; u < 8; u++) { for (int v = 0; v < 8; v++) { DCTMatrixOut[v + (u * 8)] = (double)DCTMatrix[v + (u * 8)] * Q[v + (u * 8)]; } } return(DCTMatrixOut); } public static double[] DCT(ref int[] data) { double[] DCTMatrix = new double[8 * 8]; for (int v = 0; v < 8; v++) { for (int u = 0; u < 8; u++) { double cu = 1; if (u == 0) cu = (1.0 / Math.Sqrt(2.0)); double cv = 1; if (v == 0) cv = (1.0 / Math.Sqrt(2.0)); double sum = 0.0; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { double s = data[x + (y * 8)]; double dctVal = Math.Cos((2 * y + 1) * v * Math.PI / 16) * Math.Cos((2 * x + 1) * u * Math.PI / 16); sum += s * dctVal; } } DCTMatrix[u + (v * 8)] = (0.25 * cu * cv * sum); } } return (DCTMatrix); } public static int[] IDCT(ref double[] DCTMatrix) { int[] Matrix = new int[8 * 8]; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { double sum = 0; for (int v = 0; v < 8; v++) { for (int u = 0; u < 8; u++) { double cu = 1; if (u == 0) cu = (1.0 / Math.Sqrt(2.0)); double cv = 1; if (v == 0) cv = (1.0 / Math.Sqrt(2.0)); double idctVal = (cu * cv) / 4.0 * Math.Cos((2 * y + 1) * v * Math.PI / 16) * Math.Cos((2 * x + 1) * u * Math.PI / 16); sum += (DCTMatrix[u + (v * 8)] * idctVal); } } Matrix[x + (y * 8)] = (int)Math.Round(sum); } } return (Matrix); } } public class Decoder { public static string Decode(Bitmap b, int expectedLength) { expectedLength *= Encoder.HIDE_COUNT; uint imgWidth = ((uint)b.Width) & ~((uint)7); // Maximum usable X resolution (divisible by 8). uint imgHeight = ((uint)b.Height) & ~((uint)7); // Maximum usable Y resolution (divisible by 8). // Setup the counters and byte data for the message to decode. byte[] codeBytes = new byte[expectedLength]; byte[] outBytes = new byte[expectedLength / Encoder.HIDE_COUNT]; int bitofs = 0; // Current bit position we've decoded too. int totalBits = (codeBytes.Length * 8); // Total number of bits to decode. for (int y = 0; y < imgHeight; y += 8) { for (int x = 0; x < imgWidth; x += 8) { int[] blueData = ImageKey.Encoder.GetBlueChannelData(b, x, y); double[] blueDCT = ImageKey.Encoder.DCT(ref blueData); int[] blueDCTI = ImageKey.Encoder.Quantize(ref blueDCT, ref Encoder.Q2); int[] blueDCTC = ImageKey.Encoder.Quantize(ref blueDCT, ref Encoder.Q); if (bitofs < totalBits) GetBits(ref blueDCTI, ref blueDCTC, ref bitofs, ref totalBits, ref codeBytes); } } bitofs = 0; for (int i = 0; i < (expectedLength / Encoder.HIDE_COUNT) * 8; i++) { int bytePos = (bitofs) >> 3; int bitPos = (bitofs) % 8; byte mask = (byte)(1 << bitPos); List<int> values = new List<int>(); int zeroCount = 0; int oneCount = 0; for (int j = 0; j < Encoder.HIDE_COUNT; j++) { int val = (codeBytes[bytePos + ((expectedLength / Encoder.HIDE_COUNT) * j)] & mask) >> bitPos; values.Add(val); if (val == 0) zeroCount++; else oneCount++; } if (oneCount >= zeroCount) outBytes[bytePos] |= mask; bitofs++; values.Clear(); } return (System.Text.Encoding.ASCII.GetString(outBytes)); } public static void GetBits(ref int[] DCTMatrix, ref int[] CMatrix, ref int bitofs, ref int totalBits, ref byte[] codeBytes) { for (int u = 0; u < 8; u++) { for (int v = 0; v < 8; v++) { if ((u != 0 || v != 0) && CMatrix[v + (u * 8)] != 0 && DCTMatrix[v + (u * 8)] != 0) { if (bitofs < totalBits) { int bytePos = (bitofs) >> 3; int bitPos = (bitofs) % 8; byte mask = (byte)(1 << bitPos); int value = DCTMatrix[v + (u * 8)] & (1 << Encoder.HIDE_BIT_POS); if (value != 0) codeBytes[bytePos] |= mask; bitofs++; } } } } } } } UPDATE: By switching to using a QR Code as the source message and swapping a pair of coefficients in each block instead of bit manipulation I've been able to get the message to survive the transform. However to get the message to come through without corruption I have to adjust both coefficients as well as swap them. For example swapping (3,4) and (4,3) in the DCT matrix and then respectively adding 8 and subtracting 8 as an arbitrary constant seems to work. This survives a re-JPEG'ing of 96 but any form of scaling/cropping destroys the message again. I was hoping that by operating on mid to low frequency values that the message would be preserved even under some light image manipulation.

    Read the article

  • DTracing TCP congestion control

    - by user12820842
    In a previous post, I showed how we can use DTrace to probe TCP receive and send window events. TCP receive and send windows are in effect both about flow-controlling how much data can be received - the receive window reflects how much data the local TCP is prepared to receive, while the send window simply reflects the size of the receive window of the peer TCP. Both then represent flow control as imposed by the receiver. However, consider that without the sender imposing flow control, and a slow link to a peer, TCP will simply fill up it's window with sent segments. Dealing with multiple TCP implementations filling their peer TCP's receive windows in this manner, busy intermediate routers may drop some of these segments, leading to timeout and retransmission, which may again lead to drops. This is termed congestion, and TCP has multiple congestion control strategies. We can see that in this example, we need to have some way of adjusting how much data we send depending on how quickly we receive acknowledgement - if we get ACKs quickly, we can safely send more segments, but if acknowledgements come slowly, we should proceed with more caution. More generally, we need to implement flow control on the send side also. Slow Start and Congestion Avoidance From RFC2581, let's examine the relevant variables: "The congestion window (cwnd) is a sender-side limit on the amount of data the sender can transmit into the network before receiving an acknowledgment (ACK). Another state variable, the slow start threshold (ssthresh), is used to determine whether the slow start or congestion avoidance algorithm is used to control data transmission" Slow start is used to probe the network's ability to handle transmission bursts both when a connection is first created and when retransmission timers fire. The latter case is important, as the fact that we have effectively lost TCP data acts as a motivator for re-probing how much data the network can handle from the sending TCP. The congestion window (cwnd) is initialized to a relatively small value, generally a low multiple of the sending maximum segment size. When slow start kicks in, we will only send that number of bytes before waiting for acknowledgement. When acknowledgements are received, the congestion window is increased in size until cwnd reaches the slow start threshold ssthresh value. For most congestion control algorithms the window increases exponentially under slow start, assuming we receive acknowledgements. We send 1 segment, receive an ACK, increase the cwnd by 1 MSS to 2*MSS, send 2 segments, receive 2 ACKs, increase the cwnd by 2*MSS to 4*MSS, send 4 segments etc. When the congestion window exceeds the slow start threshold, congestion avoidance is used instead of slow start. During congestion avoidance, the congestion window is generally updated by one MSS for each round-trip-time as opposed to each ACK, and so cwnd growth is linear instead of exponential (we may receive multiple ACKs within a single RTT). This continues until congestion is detected. If a retransmit timer fires, congestion is assumed and the ssthresh value is reset. It is reset to a fraction of the number of bytes outstanding (unacknowledged) in the network. At the same time the congestion window is reset to a single max segment size. Thus, we initiate slow start until we start receiving acknowledgements again, at which point we can eventually flip over to congestion avoidance when cwnd ssthresh. Congestion control algorithms differ most in how they handle the other indication of congestion - duplicate ACKs. A duplicate ACK is a strong indication that data has been lost, since they often come from a receiver explicitly asking for a retransmission. In some cases, a duplicate ACK may be generated at the receiver as a result of packets arriving out-of-order, so it is sensible to wait for multiple duplicate ACKs before assuming packet loss rather than out-of-order delivery. This is termed fast retransmit (i.e. retransmit without waiting for the retransmission timer to expire). Note that on Oracle Solaris 11, the congestion control method used can be customized. See here for more details. In general, 3 or more duplicate ACKs indicate packet loss and should trigger fast retransmit . It's best not to revert to slow start in this case, as the fact that the receiver knew it was missing data suggests it has received data with a higher sequence number, so we know traffic is still flowing. Falling back to slow start would be excessive therefore, so fast recovery is used instead. Observing slow start and congestion avoidance The following script counts TCP segments sent when under slow start (cwnd ssthresh). #!/usr/sbin/dtrace -s #pragma D option quiet tcp:::connect-request / start[args[1]-cs_cid] == 0/ { start[args[1]-cs_cid] = 1; } tcp:::send / start[args[1]-cs_cid] == 1 && args[3]-tcps_cwnd tcps_cwnd_ssthresh / { @c["Slow start", args[2]-ip_daddr, args[4]-tcp_dport] = count(); } tcp:::send / start[args[1]-cs_cid] == 1 && args[3]-tcps_cwnd args[3]-tcps_cwnd_ssthresh / { @c["Congestion avoidance", args[2]-ip_daddr, args[4]-tcp_dport] = count(); } As we can see the script only works on connections initiated since it is started (using the start[] associative array with the connection ID as index to set whether it's a new connection (start[cid] = 1). From there we simply differentiate send events where cwnd ssthresh (congestion avoidance). Here's the output taken when I accessed a YouTube video (where rport is 80) and from an FTP session where I put a large file onto a remote system. # dtrace -s tcp_slow_start.d ^C ALGORITHM RADDR RPORT #SEG Slow start 10.153.125.222 20 6 Slow start 138.3.237.7 80 14 Slow start 10.153.125.222 21 18 Congestion avoidance 10.153.125.222 20 1164 We see that in the case of the YouTube video, slow start was exclusively used. Most of the segments we sent in that case were likely ACKs. Compare this case - where 14 segments were sent using slow start - to the FTP case, where only 6 segments were sent before we switched to congestion avoidance for 1164 segments. In the case of the FTP session, the FTP data on port 20 was predominantly sent with congestion avoidance in operation, while the FTP session relied exclusively on slow start. For the default congestion control algorithm - "newreno" - on Solaris 11, slow start will increase the cwnd by 1 MSS for every acknowledgement received, and by 1 MSS for each RTT in congestion avoidance mode. Different pluggable congestion control algorithms operate slightly differently. For example "highspeed" will update the slow start cwnd by the number of bytes ACKed rather than the MSS. And to finish, here's a neat oneliner to visually display the distribution of congestion window values for all TCP connections to a given remote port using a quantization. In this example, only port 80 is in use and we see the majority of cwnd values for that port are in the 4096-8191 range. # dtrace -n 'tcp:::send { @q[args[4]-tcp_dport] = quantize(args[3]-tcps_cwnd); }' dtrace: description 'tcp:::send ' matched 10 probes ^C 80 value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@ 5 1 | 0 2 | 0 4 | 0 8 | 0 16 | 0 32 | 0 64 | 0 128 | 0 256 | 0 512 | 0 1024 | 0 2048 |@@@@@@@@@ 8 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@ 23 8192 | 0

    Read the article

1