How to append muliple CSV files and create MultiIndex dataframe

Question

I have multiple csv files in a folder. The objective is to append the csv files into a single pd frame.

The question is how can we use pandas to concatenate all files in the folder, but at the same time associate specific keys with each of the pieces of the chopped up DataFrame using the keys argument: keys.

This means that we can now select out each chunk by key:

For example, Given two csv files in a folder, each of the csv have 3 column (A, B, C) and two rows.

CSV File: Book1

A0 B0 C0

A1 B1 C1

and

CSV File: Book2

A2 B2 C2

A3 B3 C3

The expected frames as shown in the figure.

Notice the index Book1 and Book2, on the left column. This name comes from the said csv file.

So far, I have the following code

# match the pattern ‘csv’ in the folder
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

But where under the following line of code I need to change to achieve the said objective?

combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])

The reason why adding this keys is, to make easy access in the future. This usually can be achieve from

.loc['Book1']

stahamtan · Accepted Answer · 2019-09-19 14:07:32Z

1

you can add an extra column to each dataframe using assign method; this can be done after they are read and before concatenated

combined_csv = pd.concat([pd.read_csv(f).assign(name=f) for f in all_filenames ])

This will add name column with all values equal to file name f.

When all datasets are concatenated, you could set MultiIndex

combined_csv.reset_index(drop=True, inplace=True)

combined_csv.set_index([combined_csv.name, combined_csv.index], inplace=True)

edited Sep 19, 2019 at 14:07

answered Sep 19, 2019 at 13:42

stahamtan

8687 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

kantal Over a year ago

And thereafter: combined_csv.set_index("name")

rpb Over a year ago

Thanks for the quick response. But I prefer to instead to make the key index instead of creating another column, for each easy access using the 'loc' argument. As shown in this link. But I appreciate your time and suggestion

stahamtan Over a year ago

See the comment above, it addresses your particular need

rpb Over a year ago

Hi @SIA, may I know if there is other way instead of creating new column as you suggested?

stahamtan Over a year ago

I believe your goal is to create a MultiIndex dataframe. So one way or another you need to add the second level of index to your dataframe, adding a column and later setting it to index is one way I am aware of.

Bharat Gera · Accepted Answer · 2019-09-19 14:02:20Z

1

Find the code below:

import pandas as pd
dfs=[]
for f in all_filenames:
    df=pd.read_csv(f)
    df['index_name']=f.split('.')[0]
    dfs.append(df)
df_combined = pd.concat(dfs)
df_combined.set_index('index_name', inplace=True)

edited Sep 19, 2019 at 14:02

answered Sep 19, 2019 at 13:45

Bharat Gera

8301 gold badge5 silver badges13 bronze badges

3 Comments

rpb Over a year ago

Hi, thanks for the quick reply. but I prefer to instead to make the key index instead of creating another column, for each easy access using the 'loc' argument. As shown in this link. But I appreciate your time and suggestion

Bharat Gera Over a year ago

With above code, you can use .loc function to fetch the data for a particular index.Right?

rpb Over a year ago

Yes, you are right, using: df_combined .loc[df_combined ['index_name']=='Book1'].

user9940344 · Accepted Answer · 2019-09-19 14:00:36Z

0

You could create a dataframe for each file, then add in which book it came from then append it to the combined_csv dataframe.

books = ['book1' 'book2',...'bookn']

i = 1

combined_csv = pd.DataFrame(columns=['Book', 'A', 'B', 'C'])

for book in books:
    data = pd.DataFrame('book{}.csv'.format(i))
    data.insert(0, 'Book', 'Book'.format(i))
    combined_csv = combined_csv.append(data, ignore_index=True)
    i += 1

combined_csv.set_index('Book', inplace=True)

Let me know if this helps?

edited Sep 19, 2019 at 14:00

answered Sep 19, 2019 at 13:40

user9940344

5822 gold badges8 silver badges27 bronze badges

2 Comments

rpb Over a year ago

Thanks for the quick response, but your suggestion does not answer the OP

user9940344 Over a year ago

See my edit, if this does not do what you want then feel free to ignore.

Collectives™ on Stack Overflow

How to append muliple CSV files and create MultiIndex dataframe

3 Answers 3

5 Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related