1

I have several excel files with their filename differentiated by different dates. I have to concatenate all these files with their filename dates being as the index columns. I have written the following code below:

path = r"C:\\Users\\atcs\\Desktop\\data science\\files\\1-Danny Jones KPI's\\Source\\"                     
fileName =  glob.glob(os.path.join(path, "*.xlsx"))
df = (pd.read_excel(f, header=None, sheetname = "YTD Summary_4") for f in fileName)
k = (re.search("([0-9]{1,2}\-[0-9]{1,2}\-[0-9]{4})", fileName))
concatenated_df   = pd.concat(df, index=k)
concatenated_df.to_csv('tableau7.csv')

What i have done here is first defined a directory then assigned all files containing xlsx files to filename. I defined filename in a datadrame, used regular expression to get date from filename and assign it to variable k. now i concatenate the file to get the output csv file. But the code somehow gives an error: TypeError: expected string or bytes-like object. Can somebody help me what i am doing wrong.

5
  • Hard answering without data, but if k is list of dates extracted with filenames then use concatenated_df = pd.concat(df, keys=k) Commented Sep 18, 2017 at 7:58
  • trying very hard to understand the following. 1. fileName is not a string but a list. 2. df is a generator, not a list. 3. You are passing a regex matcher object when you should be passing a list or string... do you know python or not? Commented Sep 18, 2017 at 8:01
  • My data contains string as well as floats and integers, i thonk there might be some problem, any suggestion looking at the error! Commented Sep 18, 2017 at 8:02
  • Nope. How about we see some data and you tell us what the heck it is you are trying to achieve with this monstrosity of code. Commented Sep 18, 2017 at 8:05
  • glob should just be a string with a wildcard in it. See post Commented Sep 18, 2017 at 8:12

2 Answers 2

1

You can use:

#simplify for add *.xlsx to path
path = r"C:\\Users\\atcs\\Desktop\\data science\\files\\1-Danny Jones KPI's\\Source\\*.xlsx"
fileName =  glob.glob(path)
#create list of DataFrames dfs
dfs = [pd.read_excel(f, header=None, sheetname = "YTD Summary_4") for f in fileName]
#add parameter keys for filenames, remove second level of multiindex
concatenated_df = pd.concat(dfs, keys=fileName).reset_index(level=1, drop=True)
#extract dates and convert to DatetimeIndex
pat = '([0-9]{1,2}\-[0-9]{1,2}\-[0-9]{4})'
concatenated_df.index = pd.to_datetime(concatenated_df.index.str.extract(pat, expand=False))
print (concatenated_df)
Sign up to request clarification or add additional context in comments.

Comments

0

A little mod,

path = r"C:\\Users\\atcs\\Desktop\\data science\\files\\1-Danny Jones KPI's\\Source\\*.xlsx"                     
fileName =  glob.glob(path)
l = []
for f in fileName:
    df = pd.read_excel(f, header=None, sheetname = "YTD Summary_4")
    df['date'] = f
    l.append(df)
concatenated_df   = pd.concat(l).set_index('date')
concatenated_df.to_csv('tableau7.csv')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.