2

I want to use python to determine if the first instance of an ID value in the "Id" column has a match on a later row in that same column. If it does, then I want to take the value from the "Avail" column for the rows which match that initial "Id" value. Then I want to delete the rows with the duplicate Ids.

Here's my sample data: I have a CSV file that has data like this:

Id,First,Last,Avail  
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate  
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate  
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate  
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7 

Desired output (v1) (Please note that I don't care about the "First" or "Last" columns from the duplicate rows. I only care about the "Avail" data from those:

Id,First,Last,Avail  
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate  
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate  
abcdefg,Nancy,Adams,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7  

Then I'd like to delete the "duplicate" rows, leaving this:

Id,First,Last,Avail  
    abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate  
    dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate  
    gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate 
1

1 Answer 1

1
import pandas as pd

df = pd.DataFrame(data=[
        [1, 'John', 'Smith', 'a'],
        [1, 'John', 'Smith', 'b'],
        [2, 'Kate', 'Smith', 'c'],
    ],
    columns=['ID', 'First', 'Last', 'Avail']
)

output = (df
          .groupby(['ID', 'First', 'Last'], as_index=False)
          .agg({'Avail': lambda x: ';'.join(x)}))

You can use groupby as @Sphinx suggested. An example with the style of output you requested is above.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.