How to append Dataframe by rows in Python

Question

I would like to merge (using df.append()) some python dataframes by rows. The code below reported starts by reading all the json files that are in the input json_dir_path, it reads input_fn = json_data["accPreparedCSVFileName"] that contains the full path where the csv file is store and read it in the data frame df_i. When I try to merge df_output = df_i.append(df_output) I do not obtained the desired results.

    def __merge(self, json_dir_path):
    if os.path.exists(json_dir_path):
        filelist = [f for f in os.listdir( json_dir_path )]

        df_output = pd.DataFrame()
        for json_fn in filelist:
            json_full_name = os.path.join( json_dir_path, json_fn )
            # print("[TrainficationWorkflow::__merge] We are merging the json file ", json_full_name)
            if os.path.exists(json_full_name):
                with open(json_full_name, 'r') as in_json_file:
                    json_data = json.load(in_json_file)
                    input_fn = json_data["accPreparedCSVFileName"]
                    df_i = pd.read_csv(input_fn)
                    df_output = df_i.append(df_output)
        return df_output
    else:
        return pd.DataFrame(data=[], columns=self.DATA_FORMAT)

I got only 2 files are merged out of 12. What am I doing wrong?

Any help would be very appreciated.

Best Regards, Carlo

pandas.pydata.org/pandas-docs/stable/merging.html please go through the link to understand more about working of concat, append and merge in pandas dataframe. — Shrinivas Deshmukh
– Shrinivas Deshmukh, Commented May 3, 2018 at 18:13

Vikash Singh · Accepted Answer · 2018-05-03 18:16:23Z

1

You can also set ignore_index=True when appending.

df_output = df_i.append(df_output, ignore_index=True)

Also you can concatenate the dataframes:

df_output = pd.concat((df_output, df_i), axis=0, ignore_index=True)

As @jpp suggested in his answer, you can load the list of dataframes and concatenate them in 1 go.

edited May 3, 2018 at 18:16

answered May 3, 2018 at 17:49

Vikash Singh

14.1k9 gold badges45 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Carlo Allocca Over a year ago

Thanks for your answer. Sure. Do you agree that my code is also correct?

Vikash Singh Over a year ago

yes, your code is correct as per what I can see. Can you print json_full_name in the while loop and see if all 12 file names get printed?

Carlo Allocca Over a year ago

yes I did. it does. Moreover, I compared the number of rows that I get with your approach and mine, they are the same number.

Vikash Singh Over a year ago

and the number of rows do not match the sum of rows from individual files?

Carlo Allocca Over a year ago

they do. That why I come to the conclusion that also my code is correct.

|

jpp · Accepted Answer · 2018-05-03 17:52:14Z

1

I strongly recommend you do not concatenate dataframes in a loop.

It is much more efficient to store your dataframes in a list, then concatenate items of your list in one call. For example:

lst = []

for fn in input_fn:
    lst.append(pd.read_csv(fn))

df_output = pd.concat(lst, ignore_index=True)

answered May 3, 2018 at 17:52

jpp

166k37 gold badges301 silver badges363 bronze badges

Collectives™ on Stack Overflow

How to append Dataframe by rows in Python

2 Answers 2

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related