In python, I want to loop through multiple csv files and remove specific rows

Question

I have 10 csv files, in each file, I want to remove rows containing the following numbers in the UID column - 1002, 1007,1008.

Please note, all 10 csv files have the same column names

# one of the csv files looks like this

import pandas as pd

df = { 
        'UID':[1001,1002,1003,1004,1005,1006,1007,1008,1009,1010],
        'Name':['Ray','James','Juelz','Cam','Jim','Jones','Bleek','Shawn','Beanie','Amil'],
        'Income':[100.22,199.10, 191.13,199.99,230.6,124.2,122.9,128.7,188.12,111.3],
        'Age':[24,32,27,54,23,41,44,29,30,68]
}
 
df = pd.DataFrame(df)
df = df[['UID','Name','Age','Income']]
df

Attempt

#I know I need a for loop or glob to iterate through the folder and filter out the desired UIDs. My dilemma is I don't know how to incorporate steps II & III  in I

#Step I: looping through the .csv files in the folder

import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        print(os.path.join(directory, filename))

# StepII: UID to be removed - 1002,1007,1008 

df2 = df[~(df.UID.isin([1002,1007,1008]))] 

# Step III: Export the new dataframes as .csv files (10 csv files)
df2.to_csv(r'mypath\data.csv)

Thanks

Scott Boston · Accepted Answer · 2021-07-15 00:45:43Z

3

Try this:

import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        filepath = os.path.join(directory, filename)
        df = pd.read_csv(filepath)
        df2 = df[~df['UID'].isin([1002,1007,1008])]
        filename, ext = filepath.rsplit('.', maxsplit=1)
        filename = f'{filename}_mod.{ext}'
        df2.to_csv(filename)

Note: @TimRoberts is right, pandas is overkill here, but if you wanted to learn here is one potential solution.

answered Jul 15, 2021 at 0:45

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tim Roberts · Accepted Answer · 2021-07-15 00:39:52Z

2

You don't need a program for this, and you certainly don't need pandas. If you have Linux tools:

grep -v -e 1002, -e 1007, -e 1008, incoming.csv > fixed.csv

Windows:

findstr /v /c:1002, /c:1007, /c:1008, incoming.csv > fixed.csv

So, in a batch file:

cd C:\Users\admin
mkdir fixed
for %i in (*.csv) do findstr /v /c:1002, /c:1007, /c:1008, %%i > fixed\%%i

answered Jul 15, 2021 at 0:39

Tim Roberts

55.3k4 gold badges28 silver badges41 bronze badges

2 Comments

nasa313 Over a year ago

sadly, I don't have Linux tools.

Tim Roberts Over a year ago

Which is why I gave you the Windows recipe.

Sergio GM · Accepted Answer · 2021-07-15 00:50:10Z

sorry for my bad english

Step II:

If i haven't miss understood, you want to remove the values [1002,1007,1008] from this list [1001,1002,1003,1004,1005,1006,1007,1008,1009,1010] in df dictionary. Simple, you iterate over the keys of the dict like this:

values = [1002,1007,1008] 

for key in df.keys():

then you check if there are any of the values you want to remove in the value of that key

values = [1002,1007,1008] 
for key in df.keys():
    for value in values:
        if value in df[key]:
            df[key].remove(value)

Step III

import csv

with open('my_file.csv', mode='w') as file:
    file_writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    file_writer.writerow(df)

Collectives™ on Stack Overflow

In python, I want to loop through multiple csv files and remove specific rows

3 Answers 3

Comments

2 Comments

Step II:

Step III

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Step II:

Step III

Comments

Your Answer

Sign up or log in

Post as a guest

Related