2

I have a directory containing many images(*.jpg). Each image has a name. In the same directory i have a file containing python code(below).

import numpy as np
import pandas as pd
import glob

fd = open('melanoma.csv', 'a')
for img in glob.glob('*.jpg'):
    dataFrame = pd.read_csv('allcsv.csv')
    name = dataFrame['name']
    for i in name:
        #print(i)
        if(i+'.jpg' == img):
            print(i) 

In the same directory i have another file(allcsv.csv) containing large amount of csv data for all images in the directory and many other images also. The above code compares the names of images with the name column in the allcsv.csv file and prints the names. I need to modify this code to write all the data in a row of the compared images into a file named 'melanoma.csv'.

eg:

allcsv.csv

name,age,sex    
ISIC_001,85,female    
ISIC_002,40,female    
ISIC_003,30,male    
ISIC_004,70,female     

if the folder has the images only for ISIC_002 and ISIC_003

melanoma.csv

name,age,sex    
ISIC_002,40,female    
ISIC_003,30,male

2 Answers 2

2

First, your code reads the .csv file once for every image. Second, you have a nested for-loop. Both is not ideal. I recommend the following approach:

Step 1 - Create list of image file names

import glob

image_names = [f.replace('.jpg', '') for f in glob.glob("*.jpg")]

Step 2 - Create dataframe with patient names

import pandas

df_patients = pd.read_csv('allcsv.csv')

Step 3 - Filter healthy patients and dump to csv

df_sick = df_patients[df_patients['name'].isin(image_names)] 
df_sick.to_csv('melanoma.csv', index = False)

Step 4 - Print names of sick patients

for rows in df_sick.iterrows():
    print row.name, 'has cancer'
Sign up to request clarification or add additional context in comments.

5 Comments

This worked after some adjustments. I had to concatenate the values in the name column with '.jpg' in the allcsv.csv file. Othervise line 1 of step 3 won't show any result. Thanks a lot.
You are welcome. I understand. I kindly recommend to remove the .jpg from image_names list rahter than adding it to the .CSV. In this way, you don't have to modify your original data. I update my answer accordingly.
I updated my answer accordingly but DID NOT TEST IT. Please let me know if it works. Also, I would love if you upvote my answer.
I did upvote. My upvote won't show due to low reputation. Sorry. But the solution works and helped me a lot. Thank you
@HarshitNagar No problem man - thanks for coming back to me on that, I appreciate that a lot. All the best and cudos for working on something that helps sick people
0

This is just a solution for storing the matched values to a new file melanoma.csv.

Your code can be further improved and optimized.

import numpy as np
import pandas as pd
import glob

# Create a dictionary object
d={}

for img in glob.glob('*.jpg'):
    dataFrame = pd.read_csv('allcsv.csv')
    name = dataFrame['name']
    for i in name:
        #print(i)
        if(i+'.jpg' == img):
            # update dictionary d everytime a match is found with all the required values
            d['name'] = i
            d['age']= dataFrame['age']
            d['sex'] = dataFrame['sex']

# convert dictionary d to dataframe
df = pd.DataFrame(d, columns=d.keys())
#Save dataframe to csv
df.to_csv('--file path--/melanoma.csv')

3 Comments

d['name'] = i d['age']= dataFrame['age'] d['sex'] = dataFrame['sex'] Do i have to list all the columns this way ? i have large number of columns. Is it possible to write the whole row to the dataframe?
There is no need to iterate nor us there any need to touch any other columns than 'name', based on your question. See answer below.
I'm creating a dictionary first and then importing the dictionary to dataframe. So writing all the columns name,age,sex for a row to the dataframe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.