2

I try to drop the duplicate row but I got the the error code: 'Series' object has no attribute 'remove'.

May I know how can I replace the 'remove' command or fix the attributeError?

If the row is duplicate in allMYemail.csv, the row must remove. There is my code:

import csv
import re
import json
import pandas as pd

df1 = pd.read_csv('allMYemail.csv')
df2 = pd.read_csv('MYallmatchagain.csv')

emailSet = set()
for i, row in df1.dropna().iterrows():
    emailSet.add(row['0'])
# print(emailSet)
output = []
for i,row in df2.iterrows():
    # print(row)
    Birthdate = row['Birthdate']
    Gender = row['Gender']
    Mobile2 = row['Mobile2']
    Salutation = row['Salutation']
    email = row['email']
    firstName = row['firstName']
    lastName = row['lastName']
    name = row['name']
    areaCode = row['areaCode']
    errorCode = row['errorCode']
    localNumber = row['localNumber']
    Status = row['Status']
    Domain = row['Domain']
    ReturnCode = row['ReturnCode']
    matched = False
    for emails in emailSet:
        if emails == email:
            matched = True
            break
    if matched:
        row.remove('Birthdate')
        row.remove('Gender')
        row.remove('Mobile2')
        row.remove('Salutation')
        row.remove('email')
        row.remove('firstName')
        row.remove('lastName')
        row.remove('name')
        row.remove('areaCode')
        row.remove('errorCode')
        row.remove('localNumber')
        row.remove('Status')
        row.remove('Domain')
        row.remove('ReturnCode')
    else:
        pass
    output_obj = {}
    output_obj['Birthdate'] = Birthdate 
    output_obj['Gender'] = Gender
    output_obj['Mobile2'] = Mobile2 
    output_obj['Salutation'] = Salutation 
    output_obj['email'] = email 
    output_obj['firstName'] = firstName 
    output_obj['lastName'] = lastName 
    output_obj['name'] = name
    output_obj['areaCode'] = areaCode
    output_obj['errorCode'] = errorCode 
    output_obj['localNumber'] = localNumber 
    output_obj['Status'] = Status 
    output_obj['Domain'] = Domain
    output_obj['ReturnCode'] = ReturnCode 
    output.append(output_obj)
df = pd.read_json(json.dumps(output))
# print(json.dumps(output))
df.to_csv(r'MYfinish.csv', index = None)

Any help would be very much appreciated.

0

2 Answers 2

2

Since your question is not clear on what it wants to do, If you only want to remove fully duplicate rows in just one df then @Renaud 's solution will do the job. If you want to remove the rows based on the duplicates in a single column 'email' then try this:

def firstline(d):
   return(d.reset_index(drop=True).loc[0])

result_df = df.groupby('email').apply(firstline)
Sign up to request clarification or add additional context in comments.

1 Comment

thank you. yes, I want to remove the rows based on the duplicates in a single column 'email' .
1

Did you try drop_duplicates() from pandas ?

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html

df.drop_duplicates(inplace=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.