0

I am looking to learn how to concatenate multiple columns in python. I have a dataset which looks like this:

gene    match_type  drug                sources      pmids
ABO     Definite    CHEMBL50267         DrugBank     17139284|17016423
ABO     Definite    URIDINE_DIPHOSPHATE TdgClinicalTrial   17139284|17016423
ABO     Definite    CHEMBL439009        DrugBank     12972418
ABO     Definite    CHEMBL1232343       DrugBank       NA
ABO     Definite    CHEMBL503075        DrugBank       NA   

I am trying to bring this into one row (concatenating the drug column, the sources column and the pmids column) to look like:

gene    match_type  drug                                                                         sources                                           pmids
ABO     Definite    CHEMBL1232343 CHEMBL439009 CHEMBL50267 CHEMBL503075 URIDINE_DIPHOSPHATE NA  DrugBank TdgClinicalTrial DrugBank DrugBank DrugBank    0 12972418 17139284|17016423  17139284|17016423 NA NA

I have looked into using if statements using pandas.concat and .iterrows to go through everything, but I have gotten a bit lost with this and I am not sure actually what functions I should have started with to achieve my goal. Any help in the right direction would be appreciated.

This is what I've tried but it's got a lot wrong it if not everything:

for index, row in data.iterrows():
    if[1,2]==[2,1]:
        pd.concat(['drug'],['interaction_types'],['sources'],['pmids'],)
    else: 
        print(row[:])
2
  • Can you show what you have tried Commented Feb 18, 2019 at 11:35
  • Done but I don't think I've done anything correctly, not sure where to start with getting pd.concat for only specific columns Commented Feb 18, 2019 at 11:40

1 Answer 1

1

Using pd.DataFrame.groupby and its agg:

joined_df = df.groupby(["gene", "match_type"]).agg(lambda x: ' '.join(x.astype(str))).reset_index()
print(joined_df)
  gene match_type                                               drug  
0  ABO   Definite  CHEMBL50267 URIDINE_DIPHOSPHATE CHEMBL439009 C...   

                                             sources
0  DrugBank TdgClinicalTrial DrugBank DrugBank Dr...   

                                               pmids  
0  17139284|17016423 17139284|17016423 12972418 n...  
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.