0

I have 10 csv files, in each file, I want to remove rows containing the following numbers in the UID column - 1002, 1007,1008.

Please note, all 10 csv files have the same column names

# one of the csv files looks like this

import pandas as pd

df = { 
        'UID':[1001,1002,1003,1004,1005,1006,1007,1008,1009,1010],
        'Name':['Ray','James','Juelz','Cam','Jim','Jones','Bleek','Shawn','Beanie','Amil'],
        'Income':[100.22,199.10, 191.13,199.99,230.6,124.2,122.9,128.7,188.12,111.3],
        'Age':[24,32,27,54,23,41,44,29,30,68]
}
 
df = pd.DataFrame(df)
df = df[['UID','Name','Age','Income']]
df 



Attempt

#I know I need a for loop or glob to iterate through the folder and filter out the desired UIDs. My dilemma is I don't know how to incorporate steps II & III  in I

#Step I: looping through the .csv files in the folder

import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        print(os.path.join(directory, filename))

# StepII: UID to be removed - 1002,1007,1008 

df2 = df[~(df.UID.isin([1002,1007,1008]))] 

# Step III: Export the new dataframes as .csv files (10 csv files)
df2.to_csv(r'mypath\data.csv)
  

Thanks

3 Answers 3

3

Try this:

import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        filepath = os.path.join(directory, filename)
        df = pd.read_csv(filepath)
        df2 = df[~df['UID'].isin([1002,1007,1008])]
        filename, ext = filepath.rsplit('.', maxsplit=1)
        filename = f'{filename}_mod.{ext}'
        df2.to_csv(filename)

Note: @TimRoberts is right, pandas is overkill here, but if you wanted to learn here is one potential solution.

Sign up to request clarification or add additional context in comments.

Comments

2

You don't need a program for this, and you certainly don't need pandas. If you have Linux tools:

grep -v -e 1002, -e 1007, -e 1008, incoming.csv > fixed.csv

Windows:

findstr /v /c:1002, /c:1007, /c:1008, incoming.csv > fixed.csv

So, in a batch file:

cd C:\Users\admin
mkdir fixed
for %i in (*.csv) do findstr /v /c:1002, /c:1007, /c:1008, %%i > fixed\%%i

2 Comments

sadly, I don't have Linux tools.
Which is why I gave you the Windows recipe.
0

sorry for my bad english

Step II:

If i haven't miss understood, you want to remove the values [1002,1007,1008] from this list [1001,1002,1003,1004,1005,1006,1007,1008,1009,1010] in df dictionary. Simple, you iterate over the keys of the dict like this:

values = [1002,1007,1008] 

for key in df.keys():

then you check if there are any of the values you want to remove in the value of that key

values = [1002,1007,1008] 
for key in df.keys():
    for value in values:
        if value in df[key]:
            df[key].remove(value)

Step III

import csv

with open('my_file.csv', mode='w') as file:
    file_writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    file_writer.writerow(df)
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.