Using date as index in output file

Question

I have several excel files with their filename differentiated by different dates. I have to concatenate all these files with their filename dates being as the index columns. I have written the following code below:

path = r"C:\\Users\\atcs\\Desktop\\data science\\files\\1-Danny Jones KPI's\\Source\\"                     
fileName =  glob.glob(os.path.join(path, "*.xlsx"))
df = (pd.read_excel(f, header=None, sheetname = "YTD Summary_4") for f in fileName)
k = (re.search("([0-9]{1,2}\-[0-9]{1,2}\-[0-9]{4})", fileName))
concatenated_df   = pd.concat(df, index=k)
concatenated_df.to_csv('tableau7.csv')

What i have done here is first defined a directory then assigned all files containing xlsx files to filename. I defined filename in a datadrame, used regular expression to get date from filename and assign it to variable k. now i concatenate the file to get the output csv file. But the code somehow gives an error: TypeError: expected string or bytes-like object. Can somebody help me what i am doing wrong.

Hard answering without data, but if k is list of dates extracted with filenames then use concatenated_df = pd.concat(df, keys=k) — jezrael
– jezrael, Commented Sep 18, 2017 at 7:58
trying very hard to understand the following. 1. fileName is not a string but a list. 2. df is a generator, not a list. 3. You are passing a regex matcher object when you should be passing a list or string... do you know python or not? — cs95
– cs95, Commented Sep 18, 2017 at 8:01
My data contains string as well as floats and integers, i thonk there might be some problem, any suggestion looking at the error! — Shubzumt
– Shubzumt, Commented Sep 18, 2017 at 8:02
Nope. How about we see some data and you tell us what the heck it is you are trying to achieve with this monstrosity of code. — cs95
– cs95, Commented Sep 18, 2017 at 8:05
glob should just be a string with a wildcard in it. See post — jwillis0720
– jwillis0720, Commented Sep 18, 2017 at 8:12

jezrael · Accepted Answer · 2017-09-18 08:21:54Z

1

You can use:

#simplify for add *.xlsx to path
path = r"C:\\Users\\atcs\\Desktop\\data science\\files\\1-Danny Jones KPI's\\Source\\*.xlsx"
fileName =  glob.glob(path)
#create list of DataFrames dfs
dfs = [pd.read_excel(f, header=None, sheetname = "YTD Summary_4") for f in fileName]
#add parameter keys for filenames, remove second level of multiindex
concatenated_df = pd.concat(dfs, keys=fileName).reset_index(level=1, drop=True)
#extract dates and convert to DatetimeIndex
pat = '([0-9]{1,2}\-[0-9]{1,2}\-[0-9]{4})'
concatenated_df.index = pd.to_datetime(concatenated_df.index.str.extract(pat, expand=False))
print (concatenated_df)

answered Sep 18, 2017 at 8:21

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jwillis0720 · Accepted Answer · 2017-09-18 08:08:33Z

0

A little mod,

path = r"C:\\Users\\atcs\\Desktop\\data science\\files\\1-Danny Jones KPI's\\Source\\*.xlsx"                     
fileName =  glob.glob(path)
l = []
for f in fileName:
    df = pd.read_excel(f, header=None, sheetname = "YTD Summary_4")
    df['date'] = f
    l.append(df)
concatenated_df   = pd.concat(l).set_index('date')
concatenated_df.to_csv('tableau7.csv')

answered Sep 18, 2017 at 8:08

jwillis0720

4,5178 gold badges45 silver badges79 bronze badges

Collectives™ on Stack Overflow

Using date as index in output file

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related