0

I am trying to loop through all the Json files(they are all in the same format) in a certain path, and extract specific fields from those Json file, append them together and saved as an csv file.

I can achieve my goal via following code:

    import pandas as pd   
    import os
    allfiles = os.listdir('.')
    files = [files for files in allfiles if files.endswith('.json')]
    mydata=pd.DataFrame()
    for filename in files:
        #Read Joson File
        df = pd.read_json(filename)
        df=df.loc[:,['col1','col2', 'col3']].set_index('col1')
        mydata=mydata.append(df)
    mydata.to_csv('Result.csv')

For example, my original data in two files looks like:

       File 1                              File 2       
col1    col2    col3                col1    col2    col3
 A       B        C                   D       E        F

The result file from my code gives me(in the 2nd image), however I want to have a break line between those two files(as my Target table) when I append them together, so what should I add to my code in order to make this happens?

       My Result                    Target      
col1    col2    col3        col1    col2    col3
   A      B      C             A      B      C
   D      E      F              
                               D      E      F

Thanks

2
  • I'm not sure if you can append an empty value. You can append NaN by df.append(pd.Series([np.nan]), ignore_index = True) Commented Nov 22, 2018 at 5:22
  • Like inserting a line between data from different files Commented Nov 22, 2018 at 5:30

1 Answer 1

1

To add an empty line to a file, just write the newline character '\n'.

In your case, you can try:

after line:

mydata=mydata.append(df)

add

mydata=mydata.append(',,\n')

So, if you open the csv file with notepad (or some text editor) you will see:

A,B,C

,,

D,E,F

and in excel, you will see what you want in target above.

Sign up to request clarification or add additional context in comments.

3 Comments

I meet a error: cannot concatenate object of type "<class 'str'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
Try creating dataframe with pd.DataFrame([[" ", " ", " "]]) and then append the dataframe.
I create a df: df1 = pd.DataFrame([[np.nan] * len(df.columns)], columns=df.columns,index=df.index.unique()); then mydata=mydata.append(df) mydata=mydata.append(df1); and this is working. thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.