Python Pandas Dataframe Append Rows

Question

I'm trying to append the data frame values as rows but its appending them as columns. I have 32 files that i would like to take the second column from (called dataset_code) and append it. But its creating 32 rows and 101 columns. I would like 1 column and 3232 rows.

import pandas as pd
import os



source_directory = r'file_path'

df_combined = pd.DataFrame(columns=["dataset_code"])

for file in os.listdir(source_directory):
    if file.endswith(".csv"):
            #Read the new CSV to a dataframe.  
            df = pd.read_csv(source_directory + '\\' + file)
            df = df["dataset_code"]
            df_combined=df_combined.append(df)



print(df_combined)

Are you sure the columns are the same? from append docs: "Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns." — DeepSpace
– DeepSpace, Commented Aug 14, 2016 at 13:40
yes, when i subset df and print it, it prints the appropriate column — user4974662
– user4974662, Commented Aug 14, 2016 at 13:42

Alicia Garcia-Raboso · Accepted Answer · 2016-08-14 15:29:51Z

7

You already have two perfectly good answers, but let me make a couple of recommendations.

If you only want the dataset_code column, tell pd.read_csv directly (usecols=['dataset_code']) instead of loading the whole file into memory only to subset the dataframe immediately.
Instead of appending to an initially-empty dataframe, collect a list of dataframes and concatenate them in one fell swoop at the end. Appending rows to a pandas DataFrame is costly (it has to create a whole new one), so your approach creates 65 DataFrames: one at the beginning, one when reading each file, one when appending each of the latter — maybe even 32 more, with the subsetting. The approach I am proposing only creates 33 of them, and is the common idiom for this kind of importing.

Here is the code:

import os
import pandas as pd

source_directory = r'file_path'

dfs = []
for file in os.listdir(source_directory):
    if file.endswith(".csv"):
        df = pd.read_csv(os.join.path(source_directory, file),
                        usecols=['dataset_code'])
        dfs.append(df)

df_combined = pd.concat(dfs)

answered Aug 14, 2016 at 15:29

Alicia Garcia-Raboso

14k1 gold badge47 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user4974662 Over a year ago

Thank you Alberto, I changed yours to the accepted answer because it is the better solution

Nehal J Wani · Accepted Answer · 2016-08-14 14:00:35Z

3

df["dataset_code"] is a Series, not a DataFrame. Since you want to append one DataFrame to another, you need to change the Series object to a DataFrame object.

>>> type(df)
<class 'pandas.core.frame.DataFrame'>
>>> type(df['dataset_code'])
<class 'pandas.core.series.Series'>

To make the conversion, do this:

df = df["dataset_code"].to_frame()

edited Aug 14, 2016 at 14:00

answered Aug 14, 2016 at 13:57

Nehal J Wani

16.7k3 gold badges72 silver badges93 bronze badges

1 Comment

user4974662 Over a year ago

hey Nehal, this worked, thank you!! But why did it work? Can you help me understand?

Parfait · Accepted Answer · 2016-08-14 14:44:25Z

3

Alternatively, you can create a dataframe with double square brackets:

df = df[["dataset_code"]]

answered Aug 14, 2016 at 14:44

Parfait

108k19 gold badges102 silver badges138 bronze badges

Collectives™ on Stack Overflow

Python Pandas Dataframe Append Rows

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related