Loop in python pandas to modify all CSV's in folder

Question

I have problem one more time with looping. I have this script which modifies one CSV file "TTF-Projects-INFO" and saves new modified csv as "newTTF-Projects-INFO"

import pandas as pd
import csv

df = pd.read_csv('TTF-Projects-INFO.csv', sep=": \s+", engine='python', names=['dane', 'wartosc'])

# creating columns with names: ścieżka_do_pliku:czcionka.ttf 
df['dana_czcionka'] = df['dane'].str.split(':').str[0]

print('\n--- df ---\n')
print(df.to_string())

with open('newTTF-Projects-INFO.csv', 'w') as f_out:
    writer = csv.writer(f_out)
    
# sorting data by columns:czcionka.ttf 
    for name, data in df.groupby('dana_czcionka'):
        print('\n---', name, '---\n')
        
        headers = (data['dane'] + ":").to_list()
        print(headers)
    
        values = data['wartosc'].to_list()
        print(values)
        values.insert(0, name)  
        values.insert(0, name) 
        #writer.writerow(headers) 
        writer.writerow(values)
            
# effect in terminal, saves to new file

print('\n--- file ---\n')
print(open('newTTF-Projects-INFO.csv').read())

Now I have to modify the script to start doing the same thing with csv but for all CSV in folder. So far I managed to make something like that:

from pathlib import Path 
import pandas as pd
dir = r'/users/krzysztofpaszta/CSVtoGD' 
csv_files = [f for f in Path(dir).glob('*.csv')] 



for csv in csv_files: #iterate list
   
    df = pd.read_csv('*.csv', sep=": \s+", engine='python', names=['dane', 'wartosc'])

    # tworzenie kolumn z nazwami: ścieżka_do_pliku:czcionka.ttf 
    df['dana_czcionka'] = df['dane'].str.split(':').str[0]

    print('\n--- df ---\n')
    print(df.to_string())

    with open('*.csv', 'w') as f_out:
        writer = csv.writer(f_out)
    
# grupowanie danych według kolumn ścieżka_do_pliku:czcionka.ttf 
    for name, data in df.groupby('dana_czcionka'):
        print('\n---', name, '---\n')
        
        headers = (data['dane'] + ":").to_list()
        print(headers)
    
        values = data['wartosc'].to_list()
        print(values)
        values.insert(0, name) # - DODAJE NAZWE (ŚCIEŻKĘ) DO KAZDEGO WIERSZA Z DANYMI 
        values.insert(0, name) # DODAJE DRUGIE SCIEZKI DO PLIKOW - JEDNA SCIEZKA JEST SKROCANA W DALSZEJ CZESCI (BASH) ABY PYTHON MOGL TO POSORTOWAC, DRUGIE PATH ZOSTAJE DLA INFORMACJI
        #writer.writerow(headers) 
        writer.writerow(values)
            
# pokazywanie efektu w terminalu, zapisywanie do nowego pliku

    print(f'{csv.name} saved.')

But unfortunately is does not work. I don't know how to write the part about looping thrue every file in folder "CSVtoGD".

I got error in the

df = pd.read_csv('*.csv', sep=": \s+", engine='python', names=['dane', 'wartosc'])

So I am guessing my expression '*.csv' is not correct. I just want the original script to proceed thrue folder with CSVs and not only one specified CSV. Is there good solution to that?

EDIT So far I have changed the code but I got an error

AttributeError                            Traceback (most recent call last)
/var/folders/zw/12ns4dw96zb34ktc_vfn0zp80000gp/T/ipykernel_49714/1288759270.py in <module>
     16 
     17     with open('csv', 'w') as f_out:
---> 18         writer = csv.writer(f_out)
     19 
     20 # grupowanie danych według kolumn ścieżka_do_pliku:czcionka.ttf

AttributeError: 'PosixPath' object has no attribute 'writer'

Modified code looks like that

from pathlib import Path 
import pandas as pd
dir = r'/users/krzysztofpaszta/CSVtoGD' 
csv_files = [f for f in Path(dir).glob('*.csv')] 



for csv in csv_files: #iterate list
   
    df = pd.read_csv(csv, sep=": \s+", engine='python', names=['dane', 'wartosc'])
    # tworzenie kolumn z nazwami: ścieżka_do_pliku:czcionka.ttf 
    df['dana_czcionka'] = df['dane'].str.split(':').str[0]

    print('\n--- df ---\n')
    print(df.to_string())

    with open('csv', 'w') as f_out:
        writer = csv.writer(f_out)
    
# grupowanie danych według kolumn ścieżka_do_pliku:czcionka.ttf 
        for name, data in df.groupby('dana_czcionka'):
            print('\n---', name, '---\n')
        
            headers = (data['dane'] + ":").to_list()
            print(headers)
    
            values = data['wartosc'].to_list()
            print(values)
            values.insert(0, name) # - DODAJE NAZWE (ŚCIEŻKĘ) DO KAZDEGO WIERSZA Z DANYMI 
            values.insert(0, name) # DODAJE DRUGIE SCIEZKI DO PLIKOW - JEDNA SCIEZKA JEST SKROCANA W DALSZEJ CZESCI (BASH) ABY PYTHON MOGL TO POSORTOWAC, DRUGIE PATH ZOSTAJE DLA INFORMACJI
        #writer.writerow(headers) 
            writer.writerow(values)
            
# pokazywanie efektu w terminalu, zapisywanie do nowego pliku

    print(f'{csv.name} saved.')

when you're looping through csv_files, csv is holding the file for every loop, therefore you should read df= pd.read_csv(csv, sep=... When you are opening with while open(csv, 'w') as f_out And finally f_out.write() — Ze'ev Ben-Tsvi
– Ze'ev Ben-Tsvi, Commented May 26, 2022 at 13:16
@Ze'evBen-Tsvi now I am finding myself in trouble because of this part: with open(csv, 'w') as f_out: ---> 21 writer = csv.writer(f_out) AttributeError: 'PosixPath' object has no attribute 'writer' I think I am doing some rookie mistake in the code but I can't find where is the problem.. — marley01
– marley01, Commented May 26, 2022 at 16:08
'csv' is the file name that you open, the object name is f_out. you should manipulate f_out and then write it back f_out.write() — Ze'ev Ben-Tsvi
– Ze'ev Ben-Tsvi, Commented May 26, 2022 at 16:18
@Ze'evBen-Tsvi I trying and trying to understand it, I know you should not help me more so I will try make this script working :P Thank you for the help — marley01
– marley01, Commented May 26, 2022 at 16:43

hansestadt_greifswald · Accepted Answer · 2022-05-26 13:20:40Z

1

Sure, '*.csv' is not possible. It is also not needed because the read_csv is inside the loop and receives the csv files one by one. So you just have to pass the 'csv' loop variable instead:

df = pd.read_csv(csv, sep=": \s+", engine='python', names=['dane', 'wartosc'])

answered May 26, 2022 at 13:20

hansestadt_greifswald

1594 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

marley01 Over a year ago

now I am finding myself in trouble because of it: with open(csv, 'w') as f_out: ---> 21 writer = csv.writer(f_out) AttributeError: 'PosixPath' object has no attribute 'writer' :(

Collectives™ on Stack Overflow

Loop in python pandas to modify all CSV's in folder

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related