While working on a mainframe integration project, it occurred to me that some basic computer concepts are slipping into obscurity. For example, just about anyone can tell you that a 64-bit processor is faster than a 32-bit processer. A grade school child could tell you that a computer “speaks” in ‘1’s and ‘0’s. Some people can even tell you that there are 8 bits in a byte. However, I have found that even the most seasoned developers often can’t explain the theory behind those statements. That is not a knock on programmers; in the age of IntelliSense, what reason do we have to work with data at the bit level? Many computer theory classes treat bit-level programming as a thing of the past, no longer necessary now that storage space is plentiful. The trouble with that mindset is that the world is full of legacy systems that run programs written in the 1970’s. Today our jobs require us to extract data from those systems, regardless of the format, and that often involves low-level programming. Because it seems knowledge of the low-level concepts is waning in recent times, I thought a review would be in order. CHARACTER: See Spot Run HEX: 53 65 65 20 53 70 6F 74 20 52 75 6E DECIMAL: 83 101 101 32 83 112 111 116 32 82 117 110 BINARY: 01010011 01100101 01100101 00100000 01010011 01110000 01101111 01110100 00100000 01010010 01110101 01101110 In this example, I have broken down the words “See Spot Run” to a level computers can understand – machine language. CHARACTER: The character level is what is rendered by the computer. A “Character Set” or “Code Page” contains 256 characters, both printable and unprintable. Each character represents 1 BYTE of data. For example, the character string “See Spot Run” is 12 Bytes long, exclusive of the quotation marks. Remember, a SPACE is an unprintable character, but it still requires a byte. In the example I have used the default Windows character set, ASCII, which you can see here: http://www.asciitable.com/ HEX: Hex is short for hexadecimal, or Base 16. Humans are comfortable thinking in base ten, perhaps because they have 10 fingers and 10 toes; fingers and toes are called digits, so it’s not much of a stretch. Computers think in Base 16, with numeric values ranging from zero to fifteen, or 0 – F. Each decimal place has a possible 16 values as opposed to a possible 10 values in base 10. Therefore, the number 10 in Hex is equal to the number 16 in Decimal. DECIMAL: The Decimal conversion is strictly for us humans to use for calculations and conversions. It is much easier for us humans to calculate that [30 – 10 = 20] in decimal than it is for us to calculate [1E – A = 14] in Hex. In the old days, an error in a program could be found by determining the displacement from the entry point of a module. Since those values were dumped from the computers head, they were in hex. A programmer needed to convert them to decimal, do the equation and convert back to hex. This gets into relative and absolute addressing, a topic for another day. BINARY: Binary, or machine code, is where any value can be expressed in 1s and 0s. It is really Base 2, because each decimal place can have a possibility of only 2 characters, a 1 or a 0. In Binary, the number 10 is equal to the number 2 in decimal. Why only 1s and 0s? Very simply, computers are made up of lots and lots of transistors which at any given moment can be ON ( 1 ) or OFF ( 0 ). Each transistor is a bit, and the order that the transistors fire (or not fire) is what distinguishes one value from another in the computers head (or CPU). Consider 32 bit vs 64 bit processing…..a 64 bit processor has the capability to read 64 transistors at a time. A 32 bit processor can only read half as many at a time, so in theory the 64 bit processor should be much faster. There are many more factors involved in CPU performance, but that is the fundamental difference. DECIMAL HEX BINARY 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 10 A 1010 11 B 1011 12 C 1100 13 D 1101 14 E 1110 15 F 1111 Remember that each character is a BYTE, there are 2 HEX characters in a byte (called nibbles) and 8 BITS in a byte. I hope you enjoyed reading about the theory of data processing. This is just a high-level explanation, and there is much more to be learned. It is safe to say that, no matter how advanced our programming languages and visual studios become, they are nothing more than a way to interpret bits and bytes. There is nothing like the joy of hex to get the mind racing.