0

I want to import a column worth of data from multiple sheets in a single excel file and create a single large dataframe with all of the columns. Additionally, I want the name of the new column to be a string that it also being taken from the excel file.

I've tried a few different things, each with a different issue, but here is a start that works:

import pandas as pd


file = r'C:\Users\pazam\OneDrive\Desktop\neuromastCount\sf\Final_Raw.xlsx' #SF

path = r'C:\Users\pazam\OneDrive\Desktop\neuromastCount\sf' 



results_raw = pd.DataFrame()


for i in range(19): #19 sheets
    df = pd.read_excel(file, usecols='N',skiprows = range(0,37),nrows=36000,engine='openpyxl',header=None, sheet_name=i)
    trt = pd.read_excel(file, usecols='G',nrows=1,engine='openpyxl',header=None, sheet_name=i)

# then something that adds df to results_raw as a new column with the string in trt as column header



raw_csv = path+"/results_raw.csv"
results_raw.to_csv(raw_csv)

thanks!

2
  • Why not read all the worksheets in one go? You could then process the list of dataframes that will create to extract only the required column, and column name, from each sheet and then use pd.concat to combine them all into one dataframe.. Commented Nov 10, 2021 at 20:25
  • Oops, meant dictionary of dataframes not list. Commented Nov 10, 2021 at 20:31

2 Answers 2

1

This code will read all the sheets in the file into a dictionary of dataframes.

It will then create single column dataframes each consisting of value from column N with the column name coming from the first cell in column G.

Those dataframes will then be concatenated together using pd.concat.

import pandas as pd

file = 'Final_Raw.xlsx' #SF

df = pd.read_excel(file, sheet_name=None, header=None)

data = pd.concat([pd.DataFrame({v.iloc[0, 6]: v.iloc[:, 13]}) for k, v in df.items()], axis=1)

print(data)
      Col1    Col2    Col3
0    Data1  Data25  Data36
1    Data2  Data26  Data37
2    Data3  Data27  Data38
3    Data4  Data28  Data39
4    Data5  Data29  Data40
5    Data6  Data30  Data41
6    Data7  Data31  Data42
7    Data8  Data32  Data43
8    Data9  Data33  Data44
9   Data10  Data34  Data45
10  Data11  Data35  Data46
11  Data12     NaN  Data47
12  Data13     NaN     NaN
13  Data14     NaN     NaN
14  Data15     NaN     NaN
15  Data16     NaN     NaN
16  Data17     NaN     NaN
17  Data18     NaN     NaN
18  Data19     NaN     NaN
19  Data20     NaN     NaN
20  Data21     NaN     NaN
21  Data22     NaN     NaN
22  Data23     NaN     NaN
23  Data24     NaN     NaN

Sheet1 Sheet1

Sheet2 Sheet2

Sheet3 Sheet3

Sign up to request clarification or add additional context in comments.

2 Comments

thanks this works perfectly! I'd like to ask some questions to better understand the syntax, if that's alright! does {v.iloc[0,6]:v.iloc[:,13]} basically make a new dictionary that is then made into a dataframe by pd.DataFrame and ultimately concatenated by pd.concat? also, sorry if this is a dumb questions, but what is the purpose of 'k'
Yes, that's basically what's happening. k from k, v is the key from the dictionary created when we read in all the sheets, in this case it will be the sheet name. In this scenario it doesn't really have a purpose, but you could have used it for something. For example, if you didn't have the value in column G you could have used it to uniquely identify each column of data.
0

Use read_excel with sheet_name=None to read all sheets:

dfs = pd.read_excel(file, sheet_name=None)

df = pd.concat(dfs)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.