0

I have problem one more time with looping. I have this script which modifies one CSV file "TTF-Projects-INFO" and saves new modified csv as "newTTF-Projects-INFO"

import pandas as pd
import csv

df = pd.read_csv('TTF-Projects-INFO.csv', sep=": \s+", engine='python', names=['dane', 'wartosc'])

# creating columns with names: ścieżka_do_pliku:czcionka.ttf 
df['dana_czcionka'] = df['dane'].str.split(':').str[0]

print('\n--- df ---\n')
print(df.to_string())

with open('newTTF-Projects-INFO.csv', 'w') as f_out:
    writer = csv.writer(f_out)
    
# sorting data by columns:czcionka.ttf 
    for name, data in df.groupby('dana_czcionka'):
        print('\n---', name, '---\n')
        
        headers = (data['dane'] + ":").to_list()
        print(headers)
    
        values = data['wartosc'].to_list()
        print(values)
        values.insert(0, name)  
        values.insert(0, name) 
        #writer.writerow(headers) 
        writer.writerow(values)
            
# effect in terminal, saves to new file

print('\n--- file ---\n')
print(open('newTTF-Projects-INFO.csv').read())

Now I have to modify the script to start doing the same thing with csv but for all CSV in folder. So far I managed to make something like that:

from pathlib import Path 
import pandas as pd
dir = r'/users/krzysztofpaszta/CSVtoGD' 
csv_files = [f for f in Path(dir).glob('*.csv')] 



for csv in csv_files: #iterate list
   
    df = pd.read_csv('*.csv', sep=": \s+", engine='python', names=['dane', 'wartosc'])

    # tworzenie kolumn z nazwami: ścieżka_do_pliku:czcionka.ttf 
    df['dana_czcionka'] = df['dane'].str.split(':').str[0]

    print('\n--- df ---\n')
    print(df.to_string())

    with open('*.csv', 'w') as f_out:
        writer = csv.writer(f_out)
    
# grupowanie danych według kolumn ścieżka_do_pliku:czcionka.ttf 
    for name, data in df.groupby('dana_czcionka'):
        print('\n---', name, '---\n')
        
        headers = (data['dane'] + ":").to_list()
        print(headers)
    
        values = data['wartosc'].to_list()
        print(values)
        values.insert(0, name) # - DODAJE NAZWE (ŚCIEŻKĘ) DO KAZDEGO WIERSZA Z DANYMI 
        values.insert(0, name) # DODAJE DRUGIE SCIEZKI DO PLIKOW - JEDNA SCIEZKA JEST SKROCANA W DALSZEJ CZESCI (BASH) ABY PYTHON MOGL TO POSORTOWAC, DRUGIE PATH ZOSTAJE DLA INFORMACJI
        #writer.writerow(headers) 
        writer.writerow(values)
            
# pokazywanie efektu w terminalu, zapisywanie do nowego pliku

    print(f'{csv.name} saved.')

But unfortunately is does not work. I don't know how to write the part about looping thrue every file in folder "CSVtoGD".

I got error in the

df = pd.read_csv('*.csv', sep=": \s+", engine='python', names=['dane', 'wartosc'])

So I am guessing my expression '*.csv' is not correct. I just want the original script to proceed thrue folder with CSVs and not only one specified CSV. Is there good solution to that?

EDIT So far I have changed the code but I got an error

AttributeError                            Traceback (most recent call last)
/var/folders/zw/12ns4dw96zb34ktc_vfn0zp80000gp/T/ipykernel_49714/1288759270.py in <module>
     16 
     17     with open('csv', 'w') as f_out:
---> 18         writer = csv.writer(f_out)
     19 
     20 # grupowanie danych według kolumn ścieżka_do_pliku:czcionka.ttf

AttributeError: 'PosixPath' object has no attribute 'writer'

Modified code looks like that

from pathlib import Path 
import pandas as pd
dir = r'/users/krzysztofpaszta/CSVtoGD' 
csv_files = [f for f in Path(dir).glob('*.csv')] 



for csv in csv_files: #iterate list
   
    df = pd.read_csv(csv, sep=": \s+", engine='python', names=['dane', 'wartosc'])
    # tworzenie kolumn z nazwami: ścieżka_do_pliku:czcionka.ttf 
    df['dana_czcionka'] = df['dane'].str.split(':').str[0]

    print('\n--- df ---\n')
    print(df.to_string())

    with open('csv', 'w') as f_out:
        writer = csv.writer(f_out)
    
# grupowanie danych według kolumn ścieżka_do_pliku:czcionka.ttf 
        for name, data in df.groupby('dana_czcionka'):
            print('\n---', name, '---\n')
        
            headers = (data['dane'] + ":").to_list()
            print(headers)
    
            values = data['wartosc'].to_list()
            print(values)
            values.insert(0, name) # - DODAJE NAZWE (ŚCIEŻKĘ) DO KAZDEGO WIERSZA Z DANYMI 
            values.insert(0, name) # DODAJE DRUGIE SCIEZKI DO PLIKOW - JEDNA SCIEZKA JEST SKROCANA W DALSZEJ CZESCI (BASH) ABY PYTHON MOGL TO POSORTOWAC, DRUGIE PATH ZOSTAJE DLA INFORMACJI
        #writer.writerow(headers) 
            writer.writerow(values)
            
# pokazywanie efektu w terminalu, zapisywanie do nowego pliku

    print(f'{csv.name} saved.')
4
  • 1
    when you're looping through csv_files, csv is holding the file for every loop, therefore you should read df= pd.read_csv(csv, sep=... When you are opening with while open(csv, 'w') as f_out And finally f_out.write() Commented May 26, 2022 at 13:16
  • @Ze'evBen-Tsvi now I am finding myself in trouble because of this part: with open(csv, 'w') as f_out: ---> 21 writer = csv.writer(f_out) AttributeError: 'PosixPath' object has no attribute 'writer' I think I am doing some rookie mistake in the code but I can't find where is the problem.. Commented May 26, 2022 at 16:08
  • 'csv' is the file name that you open, the object name is f_out. you should manipulate f_out and then write it back f_out.write() Commented May 26, 2022 at 16:18
  • @Ze'evBen-Tsvi I trying and trying to understand it, I know you should not help me more so I will try make this script working :P Thank you for the help Commented May 26, 2022 at 16:43

1 Answer 1

1

Sure, '*.csv' is not possible. It is also not needed because the read_csv is inside the loop and receives the csv files one by one. So you just have to pass the 'csv' loop variable instead:

df = pd.read_csv(csv, sep=": \s+", engine='python', names=['dane', 'wartosc'])
Sign up to request clarification or add additional context in comments.

1 Comment

now I am finding myself in trouble because of it: with open(csv, 'w') as f_out: ---> 21 writer = csv.writer(f_out) AttributeError: 'PosixPath' object has no attribute 'writer' :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.