Importing data from multiple files using python

Question

I have to make an application in which I have to import all the excel files in the given folder and add it to a dataframe. The dataframe should look as shown:

Expected Data Frame

As seen in the image one of the columns for the dataframe is the name of the file.

I have successfully added that column in the final dataframe and the code is as follows:


import pandas as pd
import os
import shutil
import re  
path = 'C:/Users/Administrator/Desktop/Zerodha/Day2'
lst = os.listdir(path)
files = [os.path.join(path,x) for x in lst]
print(lst)
dataframes_lst = []

for file in files:
 filename = file.split('/')[-1]
 dataframe = pd.read_csv(file, usecols=[0,4], names ["date",filename],index_col=["date"])
dataframes_lst.append(dataframe)

df = pd.concat(dataframes_lst, axis=1)

print(df)


df.to_csv('data.csv')

The dataframe which is obtained using this code is as displayed:

For reference I will attach the snippet of one of the excel files:

Excel snippet

Also as seen there are many nan values obtained. I tried to remove them by using pd.dropna(inplace = True) function and also by doing as suggested in this post:

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

But the resultant dataframe still contains the nan values.

I have solved the nan issue by using the fillna function

Huzefa Sadikot
– Huzefa Sadikot

2020-11-19 09:22:29 +00:00
Commented Nov 19, 2020 at 9:22 — Huzefa Sadikot
– Huzefa Sadikot, Commented Nov 19, 2020 at 9:22

Orkun Berk Yuzbasioglu · Accepted Answer · 2020-11-18 10:14:09Z

2

Regarding

My doubt is that how do I loop through all the files in the directory and extract data of each file in the required format

You can loop through all the files in the directory and extract the data and filename as the header of the dataframe as such:

import pandas as pd
import os

path = './data'
lst = os.listdir('./data/')
files = [os.path.join(path,el) for el in lst]

and the structure of example.xlsx is:

dataframes_lst = []

for file in files:
    filename = file.split('/')[-1]
    dataframe = pd.read_excel(file, usecols=[3], names=[filename])
    dataframes_lst.append(dataframe)

df = pd.concat(dataframes_lst, axis=1)
print(df)

Here, the dataframes are concatenated along the axis=1 and the output of print(df)is

edited Nov 18, 2020 at 10:14

answered Nov 18, 2020 at 9:44

Orkun Berk Yuzbasioglu

1651 silver badge10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Huzefa Sadikot Over a year ago

Thanks for your time. This is the solution I was looking for,

skarit · Accepted Answer · 2020-11-18 09:20:57Z

0

Try this:

import pandas as pd    
from pathlib import Path
read_path = Path('C:/Users/Administrator/Desktop/Zerodha/Day2')
    
df = pd.concat([pd.read_csv(path) for path in read_path.glob('*.csv')])

If you want to read from excel just use read_excel and change the pattern to '*.xlsx'

answered Nov 18, 2020 at 9:20

skarit

111 silver badge3 bronze badges

1 Comment

Huzefa Sadikot Over a year ago

I am getting the data but not as expected in the Expected Data Frame. Your code gives me the data in the format mentioned in the Excel snippet. I need it in the format as mentioned in the Expected Data Frame

Collectives™ on Stack Overflow

Importing data from multiple files using python

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related