Pandas read excel files in folder and Unpivot columns into Dataframe

Question

I have about (100 files +) XLSX files in a folder with different columns names and data types

File 1:

Id  test  category
1   ab      4
2   cs      3
3   cs      1

FILE 2:

index  remove  stocks  category
1      dr      4         a
2      as      3         b
3      ae      1         v

File 3: ....

File 4.....

This is my try based on another example:

    #  current directory (including python script & all excel files)
    mydir = (os.getcwd()).replace('\\','/') + '/'
    
    #Get all excel files include subdir
    filelist=[]
    for path, subdirs, files in os.walk(mydir):
        for file in files:
            if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
                filelist.append(os.path.join(path, file))
    number_of_files=len(filelist)
    print(filelist)

# Read all excel files and save to dataframe (df[0] - df[x]),
# x is the number of excel files that have been read - 1


df=[]
for i in range(number_of_files):
    try:
        df.melt(pd.read_excel(r''+filelist[i]))
    except:
        print('Empty Excel File')
print(df)

RESULTS:

Empty Excel File
Empty Excel File
Empty Excel File
Empty Excel File
[]

How I could unpivot the data instead of "appending" the data in columns?

I want to unpivot all my files data to this dataframe format.

Dataframe:

Id    1
Id    2
Id    3
test  ab
test  cs
test  cs
category 4
category 3
category 1
index    1
index    1
index    1
remove   dr
remove   as
remove   ae
stocks   4
stocks   3
stocks   1
category a
category b
category v

Have you tried the melt method? Does exactly what you are looking to do I think. — Rasmus
– Rasmus, Commented Jul 6, 2022 at 8:15
Can you post your complete script as well as an example of what you dataframe looks like after concatenating it from the files? — Rasmus
– Rasmus, Commented Jul 6, 2022 at 8:21

Corralien · Accepted Answer · 2022-07-06 08:31:17Z

1

You could use:

import pandas as pd
import pathlib

data = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)

Output:

>>> df
     variable value
0          Id     1
1          Id     2
2          Id     3
3        test    ab
4        test    cs
5        test    cs
6    category     4
7    category     3
8    category     1
9       index     1
10      index     2
11      index     3
12     remove    dr
13     remove    as
14     remove    ae
15     stocks     4
16     stocks     3
17     stocks     1
18   category     a
19   category     b
20   category     v

answered Jul 6, 2022 at 8:31

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Corralien Over a year ago

If you want to get files on subdirectories, replace .iterdir() with .rglob('*')

Simon GIS Over a year ago

My problem now is all my files are in xlsx and doesn't work. ValueError: Excel file format cannot be determined, you must specify an engine manually.

Corralien Over a year ago

Is openpyxl is installed? Are you sure your file are really .xlsx file? Try to change the extension to .xls

Simon GIS Over a year ago

yes my files are .xlsx and Requirement already satisfied: openpyxl, Yes I'm looking for a script to change the 100 files from .xlsx to .xls

Simon GIS Over a year ago

actually i just did a text changed the files extension to .xls manually , and it return : ValueError: Excel file format cannot be determined, you must specify an engine manually.

|

Bendik Knapstad · Accepted Answer · 2022-07-06 08:26:25Z

1

I have tested it with your example input:

one={"Id": [1,2,3], "test": ["ab","cs","cs"],  "category": [4,3,1]}
two= {"index": [1,2,3],  "remove": ["dr","as","ae"],  "stocks": [4,3,1],  "category": ["a", "b", "v"]}
df1 = pd.DataFrame(one)
df2 = pd.DataFrame(two)
final = pd.concat([df1.melt(),df2.melt()])
final:
    variable value
0         Id     1
1         Id     2
2         Id     3
3       test    ab
4       test    cs
5       test    cs
6   category     4
7   category     3
8   category     1
0      index     1
1      index     2
2      index     3
3     remove    dr
4     remove    as
5     remove    ae
6     stocks     4
7     stocks     3
8     stocks     1
9   category     a
10  category     b
11  category     v

answered Jul 6, 2022 at 8:26

Bendik Knapstad

1,4581 gold badge9 silver badges20 bronze badges

1 Comment

Simon GIS Over a year ago

The problems is that i have about 100 files in the folder. Is there a easy way I could achieve that?

Collectives™ on Stack Overflow

Pandas read excel files in folder and Unpivot columns into Dataframe

2 Answers 2

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related