Parsing a Multi-Index Excel File in Pandas
- by rhaskett
I have a time series excel file with a tri-level column MultiIndex that I would like to successfully parse if possible. There are some results on how to do this for an index on stack overflow but not the columns and the parse function has a header that does not seem to take a list of rows.
The ExcelFile looks like is like the following:
Column A is all the time series dates starting on A4
Column B has top_level1 (B1) mid_level1 (B2) low_level1 (B3) data (B4-B100+)
Column C has null (C1) null (C2) low_level2 (C3) data (C4-C100+)
Column D has null (D1) mid_level2 (D2) low_level1 (D3) data (D4-D100+)
Column E has null (E1) null (E2) low_level2 (E3) data (E4-E100+)
...
So there are two low_level values many mid_level values and a few top_level values but the trick is the top and mid level values are null and are assumed to be the values to the left. So, for instance all the columns above would have top_level1 as the top multi-index value.
My best idea so far is to use transpose, but the it fills Unnamed: # everywhere and doesn't seem to work. In Pandas 0.13 read_csv seems to have a header parameter that can take a list, but this doesn't seem to work with parse.