1

I have several txt files with different file names. I would like to do two things:

1.) Load the data all at once 2.) using partial parts from the file name and add it for the dedicated dataframe as additional column 3.) adding the files together

I have below a really really manual example but want to automize it somehow. How is that possible?

The code looks like the following

import pandas as pd 

#load data files
data1 = pd.read_csv('C:/file1_USA_Car_1d.txt')
data2 = pd.read_csv('C:/file2_USA_Car_2d.txt')
data3 = pd.read_csv('C:/file3_USA_Car_1m.txt')
data4 = pd.read_csv('C:/file3_USA_Car_6m.txt')
data5 = pd.read_csv('C:file3_USA_Car_1Y.txt')

df = pd.DataFrame()

print(df)

df = data1

#--> The input for the column below should be taken from the name of the file 
df['country'] = 'USA'
df['Type'] = 'Car'
df['duration'] = '1d'

print(df)

3 Answers 3

1

Iterate over your files with glob and do some simple splitting on the filenames.

import glob
import pandas as pd

df_list = []
for file in glob.glob('C:/file1_*_*_*.txt'):
    # Tweak this to work for your actual filepaths, if needed.
    country, typ, dur = file.split('.')[0].split('_')[1:]  
    df = (pd.read_csv(file)
            .assign(Country=country, Type=typ, duration=dur))
    df_list.append(df)

df = pd.concat(df_list)
Sign up to request clarification or add additional context in comments.

2 Comments

thanks this is the way i was looking for. but somehow i receive the folloiwng error message: AttributeError: 'DataFrame' object has no attribute 'df'
thanks a lot. I updated the code including import pandas.
0

One way of doing it would be to do this:

all_res = pd.DataFrame()
file_list = ['C:/file1_USA_Car_1d.txt', 'C:/file3_USA_Car_1m.txt', 'etc']
for file_name in file_list:
    tmp = pd.read_csv(file_name)
    tmp['file_name'] = file_name
    all_res = all_res.append(tmp)

all_res = all_res.reset_index()

1 Comment

thanks a lot for the answer. This uses the full string, but I just want partial parts of it. Is that also possible?
0

I'd do something like the following:

from pathlib import Path
from operator import itemgetter

import pandas as pd

file_paths = [
    Path(path_str)
    for path_str in (
        'C:/file1_USA_Car_1d.txt', 'C:/file2_USA_Car_2d.txt',
        'C:/file3_USA_Car_1m.txt', 'C:/file3_USA_Car_6m.txt',
        'C:file3_USA_Car_1Y.txt')
]

def import_csv(csv_path):
    df = pd.read_csv(csv_path)
    df['country'], df['Type'], df['duration'] = itemgetter(1, 2, 3)(csv_path.stem.split('_'))
    return df

dfs = [import_csv(csv_path) for csv_path in file_paths]

This helps encapsulate your desired behavior in a helper function and reduces the things you need to think about.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.