Step 1:
Write a function
int GetText(char[],int);
which fills a character array from a requested file. That is, the function should prompt the user to input the filename, and then read up to the number of characters given as the second argument, terminating when the number has been reached or when the end of file is encountered. The file should then be closed. The number of characters placed in the array is then returned as the value of the function. Every character in the file should be transferred to the array. Whitespace should not be removed. When testing, assume that no more than 5000 characters will be read.
The function should be placed in a file called coding.cpp while the main will be in ass5.cpp. To enable the prototypes to be accessible, the file coding.h contains the prototypes for all the functions that are to be written in coding.cpp for this assignment. (You may write other functions. If they are called from any of the functions in coding.h, they must appear in coding.cpp where their prototypes should also appear. Do not alter coding.h. Any other functions written for this assignment should be placed, along with their prototypes, with the main function.)
Step 2:
Write a function
int SimplifyText(char[],int);
which simplifies the text in the first argument, an array containing the number of characters as given in the second argument, by converting all alphabetic characters to lower case, removing all non-alpha characters, and replacing multiple whitespace by one blank. Any leading whitespace at the beginning of the array should be removed completely. The resulting number of characters should be returned as the value of the function. Note that another array cannot appear in the function (as the file does not contain one).
For example, if the array contained the 29 characters "The 39 Steps" by John Buchan (with the " appearing in the array), the simplified text would be the steps by john buchan of length 24. The array should not contain a null character at the end.
Step 3:
Using the file test.txt, test your program so far. You will need to write a function
void PrintText(const char[],int,int);
that prints out the contents of the array, whose length is the second argument, breaking the lines to exactly the number of characters in the third argument. Be warned that, if the array contains newlines (as it would when read from a file), lines will be broken earlier than the specified length.
Step 4:
Write a function
void Caesar(const char[],int,char[],int);
which takes the first argument array, with length given by the second argument and codes it into the third argument array, using the shift given in the fourth argument. The shift must be performed cyclicly and must also be able to handle negative shifts. Shifts exceeding 26 can be reduced by modulo arithmetic. (Is C++'s modulo operations on negative numbers a problem here?)
Demonstrate that the test file, as simplified, can be coded and decoded using a given shift by listing the original input text, the simplified text (indicating the new length), the coded text and finally the decoded text.
Step 5:
The permutation cypher does not limit the character substitution to just a shift. In fact, each of the 26 characters is coded to one of the others in an arbitrary way. So, for example, a might become f, b become q, c become d, but a letter never remains the same. How the letters are rearranged can be specified using a seed to the random number generator. The code can then be decoded, if the decoder has the same random number generator and knows the seed.
Write the function
void Permute(const char[],int,char[],unsigned long);
with the same first three arguments as Caesar above, with the fourth argument being the seed. The function will have to make up a permutation table as follows:
To find what a is coded as, generate a random number from 1 to 25. Add that to a to get the coded letter. Mark that letter as used. For b, generate 1 to 24, then step that many letters after b, ignoring the used letter if encountered. For c, generate 1 to 23, ignoring a or b's codes if encountered. Wrap around at z.
Here's an example, for only the 6 letters a, b, c, d, e, f.
For the letter a, generate, from 1-5, a 2. Then a - c. c is marked as used.
For the letter b, generate, from 1-4, a 3. So count 3 from b, skipping c (since it is marked as used) yielding the coding of b - f. Mark f as used.
For c, generate, from 1-3, a 3. So count 3 from c, skipping f, giving a. Note the wrap at the last letter back to the first.
And so on, yielding
a - c
b - f
c - a
d - b (it got a 2)
e - d
f - e
Thus, for a given seed, a translation table is required.
To decode a piece of text, we need the table generated to be re-arranged so that the right hand column is in order. In fact you can just store the table in the reverse way (e.g., if a gets encoded to c, put a opposite c is the table). Write a function called
void DePermute(const char[],int,char[], unsigned long);
to reverse the permutation cypher. Again, test your functions using the test file.
At this point, any main program used to test these functions will not be required as part of the assignment. The remainder of the assignment uses some of these functions, and needs its own main function. When submitted, all the above functions will be tested by the marker's own main function.
Step 6:
If the seed number is unknown, decoding is difficult.
Write a main program which:
(i) reads in a piece of text using GetText;
(ii) simplifies the text using SimplifyText;
(iii) prints the text using PrintText;
(iv) requests two letters to swap. If we think 'a' in the text should be 'q' we would type aq as input. The text would be modified by swapping the a's and q's, and the text reprinted. Repeat this last step until the user considers the text is decoded, when the input of the same letter twice (requesting a letter to be swapped with itself) terminates the program.
Step 7:
If we have a large enough sample of coded text, we can use knowledge of English to aid in finding the permutation. The first clue is in the frequency of occurrence of each letter.
Write a function
void LetterFreq(const char[],int,freq[]);
which takes the piece of text given as the first two arguments (same as above) and returns in the 26 long array of structs (the third argument), the table of the frequency of the 26 letters. This frequency table should be in decreasing order of popularity. A simple Selection Sort will suffice. (This will be described in lectures.) When printed, this summary would look something like
v x r s z j p t n c l h u o i b w d g e a q y k f m
168106 68 66 59 54 48 45 44 35 26 24 22 20 20 20 17 13 12 12 4 4 1 0 0 0
The formatting will require the use of input/output manipulators. See the header file for the definition of the struct called freq.
Modify the program so that, before each swap is requested, the current frequency of the letters is printed. This does not require further calls to LetterFreq, however. You may use the traditional order of regular letter frequencies (E T A I O N S H R D L U) as a guide when deciding what characters to exchange.
Step 8:
The decoding process can be made more difficult if blank is also coded. That is, consider the alphabet to be 27 letters. Rewrite LetterFreq and your main program to handle blank as another character to code. In the above frequency order, space usually comes first.