Write specific rows from pandas dataframe to csv file while iterating through dataframe

Question

I have a directory containing many images(*.jpg). Each image has a name. In the same directory i have a file containing python code(below).

import numpy as np
import pandas as pd
import glob

fd = open('melanoma.csv', 'a')
for img in glob.glob('*.jpg'):
    dataFrame = pd.read_csv('allcsv.csv')
    name = dataFrame['name']
    for i in name:
        #print(i)
        if(i+'.jpg' == img):
            print(i)

In the same directory i have another file(allcsv.csv) containing large amount of csv data for all images in the directory and many other images also. The above code compares the names of images with the name column in the allcsv.csv file and prints the names. I need to modify this code to write all the data in a row of the compared images into a file named 'melanoma.csv'.

eg:

allcsv.csv

name,age,sex    
ISIC_001,85,female    
ISIC_002,40,female    
ISIC_003,30,male    
ISIC_004,70,female

if the folder has the images only for ISIC_002 and ISIC_003

melanoma.csv

name,age,sex    
ISIC_002,40,female    
ISIC_003,30,male

sudonym · Accepted Answer · 2018-06-05 12:22:57Z

2

First, your code reads the .csv file once for every image. Second, you have a nested for-loop. Both is not ideal. I recommend the following approach:

Step 1 - Create list of image file names

import glob

image_names = [f.replace('.jpg', '') for f in glob.glob("*.jpg")]

Step 2 - Create dataframe with patient names

import pandas

df_patients = pd.read_csv('allcsv.csv')

Step 3 - Filter healthy patients and dump to csv

df_sick = df_patients[df_patients['name'].isin(image_names)] 
df_sick.to_csv('melanoma.csv', index = False)

Step 4 - Print names of sick patients

for rows in df_sick.iterrows():
    print row.name, 'has cancer'

edited Jun 5, 2018 at 12:22

answered Jun 5, 2018 at 7:36

sudonym

4,0384 gold badges40 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Harshit Nagar Over a year ago

This worked after some adjustments. I had to concatenate the values in the name column with '.jpg' in the allcsv.csv file. Othervise line 1 of step 3 won't show any result. Thanks a lot.

sudonym Over a year ago

You are welcome. I understand. I kindly recommend to remove the .jpg from image_names list rahter than adding it to the .CSV. In this way, you don't have to modify your original data. I update my answer accordingly.

sudonym Over a year ago

I updated my answer accordingly but DID NOT TEST IT. Please let me know if it works. Also, I would love if you upvote my answer.

Harshit Nagar Over a year ago

I did upvote. My upvote won't show due to low reputation. Sorry. But the solution works and helped me a lot. Thank you

sudonym Over a year ago

@HarshitNagar No problem man - thanks for coming back to me on that, I appreciate that a lot. All the best and cudos for working on something that helps sick people

min2bro · Accepted Answer · 2018-06-05 07:37:25Z

0

This is just a solution for storing the matched values to a new file melanoma.csv.

Your code can be further improved and optimized.

import numpy as np
import pandas as pd
import glob

# Create a dictionary object
d={}

for img in glob.glob('*.jpg'):
    dataFrame = pd.read_csv('allcsv.csv')
    name = dataFrame['name']
    for i in name:
        #print(i)
        if(i+'.jpg' == img):
            # update dictionary d everytime a match is found with all the required values
            d['name'] = i
            d['age']= dataFrame['age']
            d['sex'] = dataFrame['sex']

# convert dictionary d to dataframe
df = pd.DataFrame(d, columns=d.keys())
#Save dataframe to csv
df.to_csv('--file path--/melanoma.csv')

answered Jun 5, 2018 at 7:37

min2bro

4,6385 gold badges33 silver badges55 bronze badges

3 Comments

Harshit Nagar Over a year ago

d['name'] = i d['age']= dataFrame['age'] d['sex'] = dataFrame['sex'] Do i have to list all the columns this way ? i have large number of columns. Is it possible to write the whole row to the dataframe?

sudonym Over a year ago

There is no need to iterate nor us there any need to touch any other columns than 'name', based on your question. See answer below.

min2bro Over a year ago

I'm creating a dictionary first and then importing the dictionary to dataframe. So writing all the columns name,age,sex for a row to the dataframe.

Collectives™ on Stack Overflow

Write specific rows from pandas dataframe to csv file while iterating through dataframe

2 Answers 2

5 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related