I have dataframe with a column called 'INFO' containing a string delimited by a semicolon like below. I would like to parse the string to return values corresponding to specific substrings and assign it to a new column.
EX. In the table below, I would want to assign the value (after the equals sign) for 'CLNDISDB' to a column of the same name.
I have tried:
df['INFO'].str.split(';',expand=True)
which assigned each string part to a new column but that would still require me to parse each column for the necessary string. Any help would be greatly appreciated.
I can only use python standard libraries
| | INFO |
|----|---------------------------------------------------|
| 0 | AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619;... |
| 1 | AF_ESP=0.00015;AF_EXAC=0.00010;ALLELEID=514926... |
| 2 | ALLELEID=181485;CLNDISDB=MedGen:C4015293,OMIM:... |
| 3 | ALLELEID=514896;CLNDISDB=MedGen:C4015293,OMIM:... |
| 4 | AF_ESP=0.00515;AF_EXAC=0.00831;AF_TGP=0.00339;... |
| 5 | AF_ESP=0.40158;AF_EXAC=0.37025;AF_TGP=0.33886;... |
| 6 | ALLELEID=556509;CLNDISDB=MedGen:C4015293,OMIM:... |
| 7 | ALLELEID=556512;CLNDISDB=MedGen:C4015293,OMIM:... |
| 8 | ALLELEID=171289;CLNDISDB=MedGen:C4015293,OMIM:... |
| 9 | AF_EXAC=0.00001;ALLELEID=171288;CLNDISDB=MedGe... |
| 10 | AF_ESP=0.00038;AF_EXAC=0.00036;AF_TGP=0.00060;... |
| 11 | AF_ESP=0.00987;AF_EXAC=0.00772;AF_TGP=0.01558;... |
| 12 | AF_ESP=0.00070;AF_TGP=0.00080;ALLELEID=446981;... |
| 13 | AF_EXAC=0.00038;ALLELEID=446941;CLNDISDB=MedGe... |
| 14 | AF_EXAC=0.00114;AF_TGP=0.00958;ALLELEID=364282... |
| 15 | ALLELEID=556516;CLNDISDB=MedGen:C3808739,OMIM:... |
| 16 | AF_EXAC=0.00024;ALLELEID=364148;CLNDISDB=MedGe... |
| 17 | ALLELEID=514900;CLNDISDB=MedGen:C3808739,OMIM:... |
| 18 | AF_EXAC=0.42418;AF_TGP=0.28255;ALLELEID=133759... |
| 19 | ALLELEID=364285;CLNDISDB=MedGen:CN169374;CLNDN... |