1

I have two excel files:

2020-01-consumption.xlsx
2020-01-production.xlsx
print(pd.read_excel(2020-01-production.xlsx, index_col = 0).head(4))


                         build_1   build_2               build_3       build_4  ...  
date                                                                            ...
2020-01-01 00:00:00          1.2       4.2                   4.3           7.0  ...
2020-01-01 01:00:00          3.3       1.9                   5.3           3.5  ...
2020-01-01 02:00:00          4.1       2.7                   6.0           2.6  ...
2020-01-01 03:00:00          3.6       6.0                   7.1           7.2  ...



print(pd.read_excel(2020-01-consumption.xlsx, index_col = 0).head(4))


                         build_1   build_2               build_3       build_4  ...  
date                                                                            ...
2020-01-01 00:00:00          0.4       1.0                   0.1           1.0  ...
2020-01-01 01:00:00          0.3       0.9                   0.0           0.4  ...
2020-01-01 02:00:00          0.3       0.5                   0.0           0.4  ...
2020-01-01 03:00:00          0.1       0.5                   0.4           0.4  ...

Columns and indexes are the same. I'm trying to set up a for loop. So under these circumstances, I want to save each column as a json file. I want to change the structure of data to this:

with open(build_1.json, encoding="utf8") as f:  #The name of the new file to be created must be the column name.
    content = json.load(f)

print(content)

{'build_1': {  #The key is column name.
    'date': [2020-01-01 00:00:00, 2020-01-01 01:00:00, 2020-01-01 02:00:00, 2020-01-01 03:00:00 ...],  #index name as a key.
    'production': [1.2, 3.3, 4.1, 3.6 ...],  #excel name is changed as a key.
    'consumption': [0.4, 0.3, 0.3, 0.1 ...]}}  #excel name is changed as a key.

I have a lot of dataframe like production and consumption. I want to show only two as an example. How can I achieve this structure? Is this possible?

1 Answer 1

1

You can use concat with keys parameter for MultiIndex in columns:

df1 = pd.read_excel('2020-01-production.xlsx', index_col = 0)
df2 = pd.read_excel('2020-01-consumption.xlsx', index_col = 0)

df = pd.concat([df1, df2], keys=['production','consumption'], axis=1)
print (df)

                    production                         consumption          \
                       build_1 build_2 build_3 build_4     build_1 build_2   
date                                                                         
2020-01-01 00:00:00        1.2     4.2     4.3     7.0         0.4     1.0   
2020-01-01 01:00:00        3.3     1.9     5.3     3.5         0.3     0.9   
2020-01-01 02:00:00        4.1     2.7     6.0     2.6         0.3     0.5   
2020-01-01 03:00:00        3.6     6.0     7.1     7.2         0.1     0.5   

                                     
                    build_3 build_4  
date                                 
2020-01-01 00:00:00     0.1     1.0  
2020-01-01 01:00:00     0.0     0.4  
2020-01-01 02:00:00     0.0     0.4  
2020-01-01 03:00:00     0.4     0.4  

And then loop by second level, selecting by DataFrame.xs, if necessary convert datetimes to strings or soem another way like need, create dictionary and last write to file:

for lvl in df.columns.levels[1]:
    print (lvl)
    df1 = df.xs(lvl, axis=1, level=1).reset_index()
    df1['date'] = df1['date'].astype(str)
    d ={lvl: df1.to_dict(orient='list')}
    #print (d)
    
    with open(f'{lvl}.json', mode='w', encoding="utf8") as f:
        json.dump(d, f)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.