1

I have about (100 files +) XLSX files in a folder with different columns names and data types

File 1:

Id  test  category
1   ab      4
2   cs      3
3   cs      1

FILE 2:

index  remove  stocks  category
1      dr      4         a
2      as      3         b
3      ae      1         v

File 3: ....

File 4.....

This is my try based on another example:

    #  current directory (including python script & all excel files)
    mydir = (os.getcwd()).replace('\\','/') + '/'
    
    #Get all excel files include subdir
    filelist=[]
    for path, subdirs, files in os.walk(mydir):
        for file in files:
            if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')):
                filelist.append(os.path.join(path, file))
    number_of_files=len(filelist)
    print(filelist)

# Read all excel files and save to dataframe (df[0] - df[x]),
# x is the number of excel files that have been read - 1


df=[]
for i in range(number_of_files):
    try:
        df.melt(pd.read_excel(r''+filelist[i]))
    except:
        print('Empty Excel File')
print(df)

RESULTS:

Empty Excel File
Empty Excel File
Empty Excel File
Empty Excel File
[]

How I could unpivot the data instead of "appending" the data in columns?

I want to unpivot all my files data to this dataframe format.

Dataframe:

Id    1
Id    2
Id    3
test  ab
test  cs
test  cs
category 4
category 3
category 1
index    1
index    1
index    1
remove   dr
remove   as
remove   ae
stocks   4
stocks   3
stocks   1
category a
category b
category v
3
  • Have you tried the melt method? Does exactly what you are looking to do I think. Commented Jul 6, 2022 at 8:15
  • if i do df.melt, it returns me empty results Commented Jul 6, 2022 at 8:19
  • 1
    Can you post your complete script as well as an example of what you dataframe looks like after concatenating it from the files? Commented Jul 6, 2022 at 8:21

2 Answers 2

1

You could use:

import pandas as pd
import pathlib

data = []
for filename in pathlib.Path.cwd().iterdir():
    if filename.suffix.lower().startswith('.xls'):
        data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)

Output:

>>> df
     variable value
0          Id     1
1          Id     2
2          Id     3
3        test    ab
4        test    cs
5        test    cs
6    category     4
7    category     3
8    category     1
9       index     1
10      index     2
11      index     3
12     remove    dr
13     remove    as
14     remove    ae
15     stocks     4
16     stocks     3
17     stocks     1
18   category     a
19   category     b
20   category     v
Sign up to request clarification or add additional context in comments.

7 Comments

If you want to get files on subdirectories, replace .iterdir() with .rglob('*')
My problem now is all my files are in xlsx and doesn't work. ValueError: Excel file format cannot be determined, you must specify an engine manually.
Is openpyxl is installed? Are you sure your file are really .xlsx file? Try to change the extension to .xls
yes my files are .xlsx and Requirement already satisfied: openpyxl, Yes I'm looking for a script to change the 100 files from .xlsx to .xls
actually i just did a text changed the files extension to .xls manually , and it return : ValueError: Excel file format cannot be determined, you must specify an engine manually.
|
1

I have tested it with your example input:

one={"Id": [1,2,3], "test": ["ab","cs","cs"],  "category": [4,3,1]}
two= {"index": [1,2,3],  "remove": ["dr","as","ae"],  "stocks": [4,3,1],  "category": ["a", "b", "v"]}
df1 = pd.DataFrame(one)
df2 = pd.DataFrame(two)
final = pd.concat([df1.melt(),df2.melt()])
final:
    variable value
0         Id     1
1         Id     2
2         Id     3
3       test    ab
4       test    cs
5       test    cs
6   category     4
7   category     3
8   category     1
0      index     1
1      index     2
2      index     3
3     remove    dr
4     remove    as
5     remove    ae
6     stocks     4
7     stocks     3
8     stocks     1
9   category     a
10  category     b
11  category     v

1 Comment

The problems is that i have about 100 files in the folder. Is there a easy way I could achieve that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.