Excel string manipulation to check data consistency
- by chefsmart
Background information: - There are nearly 7000 individuals and there is data about their performances in one, two or three tests.
Every individual has taken the 1st test (let's call it Test M). Some of those who have taken Test M have also taken Test I, and some of those who have taken Test I have also taken Test B.
For the first two tests (M and I), students can score grades I, II, or III. Depending on the grades they are awarded points -- 3 for grade I, 2 for II, 1 for III.
The last Test B is just a pass or a fail result with no grades. Those passing this test get 1 point, with no points for failure. (Well actually, grades are awarded, but all grades are given a common 1 point).
An amateur has entered data to represent these students and their grades in an Excel file. Problem is, this person has done the worst thing possible - he has developed his own notation and entered all test information in a single cell --- and made my life hell.
The file originally had two text columns, one for individual's id, and the second for test info, if one could call it that.
It's horrible, I know, and I am suffering. In the image, if you see "M-II-2 I-III-1" it means the person got grade II in Test M for 2 points and grade III in Test I for 1 point. Some have taken only one test, some two, and some three.
When the file came to me for processing and analyzing the performance of students, I sent it back with instructions to insert 3 additional columns with only the grades for the three tests. The file now looks as follows. Columns C and D represent grades I, II, and III using 1,2 and 3 respectively. Column C is for Test M, column D for Test I. Column E says BA (B Achieved!) if the individual has passed Test B.
Now that you have the above information, let's get to the problem. I don't trust this and want to check whether data in column B matches with data in columns C,D and E.
That is, I want to examine the string in column B and find out whether the figures in columns C,D and E are correct.
All help is really appreciated.
P.S. - I had exported this to MySQL via ODBC and that is why you are seeing those NULLs. I tried doing this in MySQL too, and really will accept a MySQL or an Excel solution, I don't have a preference.
Edit : - See file with sample data