Generating new columns from row values in Python

Question

I have following pandas dataframe (HC_subset_umls)

    term            code            source  term_normlz     CUI         CODE        SAB     TTY     STR
0   B-cell lymphoma meddra:10003899 meddra  b-cell lymphoma C0079731    MTHU019696  OMIM    PTCS    b-cell lymphoma
1   B-cell lymphoma meddra:10003899 meddra  b-cell lymphoma C0079731    10003899    MDR     PT  b-cell lymphoma
2   Astrocytoma     meddra:10003571 meddra  astrocytoma     C0004114    10003571    MDR     PT  astrocytoma
3   Astrocytoma     meddra:10003571 meddra  astrocytoma     C0004114    D001254     MSH     MH  astrocytoma

I would like to group rows based on common CUI and generate new columns.

The desired output is:

    term            code            source  term_normlz     CUI         OMIM_CODE       OMIM_TTY        OMIM_STR  MDR_CODE      MDR_TTY     MDR_STR   MSH_CODE      MSH_TTY     MSH_STR
0   B-cell lymphoma meddra:10003899 meddra  b-cell lymphoma C0079731    MTHU019696      PTCS     b-cell lymphoma 10003899   PT  b-cell lymphoma  NA   NA   NA   NA
2   Astrocytoma     meddra:10003571 meddra  astrocytoma     C0004114    NA   NA   NA  10003571  MDR     PT  astrocytoma   D001254       MSH     MH  astrocytoma

I am using following lines of code.

HC_subset_umls['OMIM_CODE'] = (
    HC_subset_umls['CUI']
    .map(
        HC_subset_umls
        .groupby('CUI')
        .apply(lambda x: x.loc[x['SAB'].isin(['OMIM']), 'CODE'].values[0])
    )
)


HC_subset_umls['OMIM_TERM'] = (
    HC_subset_umls['CUI']
    .map(
        HC_subset_umls
        .groupby('CUI')
        .apply(lambda x: x.loc[x['SAB'].isin(['OMIM']), 'STR'].values[0])
    )
)

HC_subset_umls['OMIM_TTY'] = (
    HC_subset_umls['CUI']
    .map(
        HC_subset_umls
        .groupby('CUI')
        .apply(lambda x: x.loc[x['SAB'].isin(['OMIM']), 'TTY'].values[0])
    )
)

HC_subset_umls = HC_subset_umls[~(HC_subset_umls['SAB'].isin(['OMIM']))]

And subsequently for the other 'SAB' like 'MDR' and so on. However, I am getting following error.

IndexError: index 0 is out of bounds for axis 0 with size 0

Any help is highly appreciated.

You need to create a runnable code. it is not clear what is HC_subset_umls. Make your question replicable. — Hadij
– Hadij, Commented Dec 20, 2022 at 19:02
HC_subset_umls is the dataframe. What does it mean 'runnable code'? Thanks — rshar
– rshar, Commented Dec 20, 2022 at 19:04
create a toy example. Then, people can play with it and help you. This piece of code is not useful. — Hadij
– Hadij, Commented Dec 20, 2022 at 19:16

Scott Boston · Accepted Answer · 2022-12-20 19:20:19Z

1

Try, using groupby, ustack, and flatten multiindex column headers.

df_out = (df.groupby(['term', 'code', 'source', 'term_normlz', 'CUI', 'SAB'])
            .first()
            .unstack()
            .swaplevel(0,1, axis=1))
df_out.columns = df_out.columns.map('_'.join)
df_out.reset_index()

Output:

    term             code  source      term_normlz       CUI  MDR_CODE MSH_CODE   OMIM_CODE MDR_TTY MSH_TTY OMIM_TTY          MDR_STR      MSH_STR         OMIM_STR
0      Astrocytoma  meddra:10003571  meddra      astrocytoma  C0004114  10003571  D001254         NaN      PT      MH      NaN      astrocytoma  astrocytoma              NaN
1  B-cell lymphoma  meddra:10003899  meddra  b-cell lymphoma  C0079731  10003899      NaN  MTHU019696      PT     NaN     PTCS  b-cell lymphoma          NaN  b-cell lymphoma

answered Dec 20, 2022 at 19:20

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Generating new columns from row values in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related