2

Scenario: I have a list of files in a folder (including the file paths). I am trying to get the content of each of those files into a dataframe (one for each file), then further perform some operations and later merge these dataframes.

From various other questions in SO, I found multiple ways to iterate over the files in a folder and get the data, but all of those I found usually ready the files in a loop and concatenate them into a single dataframe automatically, which does not work for me.

For example:

import os
import pandas as pd
path = os.getcwd()
files = os.listdir(path)
files_xls = [f for f in files if f[-3:] == 'xls*']
df = pd.DataFrame()
for f in files_xls:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

or

import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xls*"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

The only piece of code I could put together from what I found is:

from os.path import isfile, join
import glob
mypath = "/DGMS/Destop/uploaded"
listoffiles = glob.glob(os.path.join(mypath, "*.xls*"))
contentdataframes = (pd.read_excel(f) for f in listoffiles)

This lines run without error, but they appear not to do anything, no variables or created nor changed.

Question: What am I doing wrong here? Is there a better way to do this?

2
  • Did you test your globing-statements, i.e. does the listofffiles contain any filepath or will it stay empty? You might simply add an print(listoffiles) statement to inspect the list. Commented Jul 9, 2018 at 9:53
  • @dorvak the listoffiles contains all the file names properly. If I get one of the entries and assign it to a dataframe manually, it works perfectly. The problem is, I am not being able to do this is a loop for all of them. Commented Jul 9, 2018 at 9:56

1 Answer 1

2

You are really close, need join all data by concat from generator:

contentdataframes = (pd.read_excel(f) for f in listoffiles)
df = pd.concat(contentdataframes, ignore_index=True)

If need list of DataFrames:

contentdataframes = [pd.read_excel(f) for f in listoffiles]
Sign up to request clarification or add additional context in comments.

7 Comments

That is the problem, I don't want to concatenate the contents at this point, I want to transform each item into the list of dataframes into a single dataframe, so I can edit them.
So contentdataframes = [pd.read_excel(f) for f in listoffiles] should work
@DGMS89 Your code created a generator object, so thats probably why it didnt work ;)
@dorvak I nevr know when to properly use () and [].
@jezrael Just one question, how could I split each member of this list into a separate dataframe? (so my editing is simplified)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.