Excel Regex, or export to Python? ; "Vlookup" in Python?
- by victorhooi
heya,
We have an Excel file with a worksheet containing people records.
1. Phone Number Sanitation
One of the fields is a phone number field, which contains phone numbers in the format e.g.:
+XX(Y)ZZZZ-ZZZZ
(where X, Y and Z are integers).
There are also some records which have less digits, e.g.:
+XX(Y)ZZZ-ZZZZ
And others with really screwed up formats:
+XX(Y)ZZZZ-ZZZZ / ZZZZ
or:
ZZZZZZZZ
We need to sanitise these all into the format:
0YZZZZZZZZ (or OYZZZZZZ with those with less digits).
2. Fill in Supervisor Details
Each person also has a supervisor, given as an numeric ID. We need to do a lookup to get the name and email address of that supervisor, and add it to the line.
This lookup will be firstly on the same worksheet (i.e. searching itself), and it can then fallback to another workbook with more people.
3. Approach?
For the first issue, I was thinking of using regex in Excel/VBA somehow, to do the parsing. My Excel-fu isn't the best, but I suppose I can learn...lol. Any particular points on this one?
However, would I be better off exporting the XLS to a CSV (e.g. using xlrd), then using Python to fix up the phone numbers?
For the second approach, I was thinking of just using vlookups in Excel, to pull in the data, and somehow, having it fall through, first on searching itself, then on the external workbook, then just putting in error text. Not sure how to do that last part.
However, if I do happen to choose to export to CSV and do it in Python, what's an efficient way of doing the vlookup? (Should I convert to a dict, or just iterate? Or is there a better, or more idiomatic way?)
Cheers,
Victor