I have a pandas data frame and in it are string values I want to count. The strings I want to count are "SYNONYMOUS_CODING" and "NON_SYNONYMOUS_CODING". I've found that these strings are located in columns 23, 24, 25, 29 and 31.
Columns 23 looks like this:
15392 OAnc=C
15393 114
15394 EFF=NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Gc...
15395 0/0:30:90.29:0
15396 pSC=0.441
15397 pSC=0.030
15398 bSC=884
...
Column 24 looks like this:
3092 EXON(MODIFIER||||870|RSPH10B|protein_coding|CO...
3093 NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aCg/aT...
3094 INTERGENIC(MODIFIER||||||||||1)
3095 INTERGENIC(MODIFIER||||||||||1)
3096 DOWNSTREAM(MODIFIER||489|||PMS2||CODING|NR_003...
3097 DOWNSTREAM(MODIFIER||408|||PMS2||CODING|NR_003...
3098 DP=12
...
Column 25 looks like:
13062 C
13063 C
13064 EFF=SYNONYMOUS(MODIFIER|||||DKFZp434L192||CODING...
13065 EFF=SYNONYMOUS(MODIFIER|||||DKFZp434L192||CODING...
13066 CAnc=G
13067 C
13068 G
Column 29 looks like:
15688 0:0
15689 0:0
15690 NaN
15691 EFF=SYNONYMOUS_CODING(LOW|SILENT|tcC/tcG|S782|...
15692 0:0
15693 NaN
15694 0:1
and Column 31 looks like:
3081 45
3082 1432:0
3083 0:0
3084 SYNONYMOUS_CODING(LOW|SILENT|acG/acA|T473|482|...
3085 9
3086 0:0
3087 0:0
I wanted to know how can I go through the five columns and count the number of times the strings "SYNONYMOUS_CODING" or "NON_SYNONYMOUS_CODING" appears without double counting. Because there might be rows where these strings appear in two or more different columns.
Thank you.
Rodrigo