I am reading an Excel file using Pandas and I feel like there has to be a better way to handle the way I create column names. This is something like the Excel file I'm reading:
1 2 # '1' is merged in the two cells above 'a'and 'b'
Date a b c d # likewise for '2'. As opposed to 'centered across selection'
1 1-Jan-19 100 200 300 400
2 1-Feb-19 101 201 301 401
3 1-Mar-19 102 202 302 402
I want my to merge the 'a','b','c',and'd' columns heads with the '1'and '2' above them, so I'm doing the following to get my headers the way that I want:
import pandas as pd
import json
xls = pd.ExcelFile(r'C:\Path_to\Excel_Pandas_Connector_Test.xls')
df = pd.read_excel(xls, 'Sheet1', header=[1]) # uses the abcd row as column names
# I only want the most recent day of data so I do the following
json_str = df[df.Date == df['Date'].max()].to_json(orient='records',date_format='iso')
dat_data = json.loads(json_str)[0]
def clean_json():
global dat_data
dat_data['1a'] = dat_data.pop('a')
dat_data['1b'] = dat_data.pop('b')
dat_data['2c'] = dat_data.pop('c')
dat_data['2d'] = dat_data.pop('d')
clean_json()
print(json.dumps(dat_data,indent=4))
My desired output is:
{
"Date": "2019-03-01T00:00:00.000Z",
"1a": 102,
"1b": 202,
"2c": 302,
"2d": 402
}
This works as written, but is there a Pandas built-in that I could have used to do the same thing instead of the clean_json function?