0

I have a data.csv file contain 1 column call data trend in json type, each row of it is:

 1. [{"period":"11/2020", "UNTPRICE": 49940000,"DIST_UNTPRICE": 30789500},
     {"period":"12/2020", "UNTPRICE": 48710000,"DIST_UNTPRICE": 30719773}]
 2. [{"period":"12/2020", "UNTPRICE": 28540000,"DIST_UNTPRICE": 27428824}]
 3. [{"period":"12/2020", "UNTPRICE": 27428824,"DIST_UNTPRICE": 28540000}]

The question here is how to covert this column to a array like this in python

|UNTPRICE(11/2020)|DIST_UNTPRICE(11/2020)|UNTPRICE(12/2020)|DIST_UNTPRICE(12/2020)|
|-----------------|----------------------|-----------------|----------------------|
|     4994000     |        30789500      | 48710000        | 30719773             |
|     NULL        |        NULL          |28540000         |27428824              |
|     NULL        |        NULL          |27428824         |28540000              |

sample raw image of csv file enter image description here

4
  • 1
    Does it really have the numbers 1, 2, 3 in the csv? Please just post the raw contents of the csv. Commented Jan 12, 2021 at 7:20
  • Are you using pandas? It has a pivot function. Commented Jan 12, 2021 at 7:24
  • @Barmar can you explain in more detail Commented Jan 12, 2021 at 7:27
  • @bremen_matt i just edited Commented Jan 12, 2021 at 7:30

2 Answers 2

2

first of all, write a function to convert a row from the csv file to a row in the data frame:

import json

def csv_row_to_df_row(csv_row):
    csv_row = json.loads(csv_row)
    df_row = {}
    for entry in csv_row:
        period = entry['period']
        for k, v in entry.items():
            if k != 'period':
                df_row[f'{k}({period})'] = int(v)
    return df_row

then you can iterate all the lines in the file and add the rows to you data frame.

import pandas as pd

df = pd.DataFrame()
with open('yourfile.csv') as f:
    for csv_row in f:
        df_row = csv_row_to_df_row(csv_row)
        df = df.append(df_row, ignore_index=True)

to get the same order of columns as in the desired output:

df = df[['UNTPRICE(11/2020)', 'DIST_UNTPRICE(11/2020)', 'UNTPRICE(12/2020)', 'DIST_UNTPRICE(12/2020)']]
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the answer but the output is not the same format that I wanted, it not contain NULL value in it
@Al3xSund3r When appending using pandas.DataFrame.append NaN is automatically added.
I want this added as new column, append not work in this case
@Al3xSund3r I've edited the answer, does it work now?
for example my csv file contain 5 column and json string in column 3, how can I do this
|
0

first load json using pandas

import pandas as pd
df = pd.read_json('data.json')

then transpose dataframe using:

df.T

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.