I have a pandas dataframe that contains a column with a 9 character string. I would like to find the rows in the dataframe that match the first 3 of the 9 characters in this string.
My current solution creates a new column in the dataframe that simply slices the first 3 characters of the string, but I would like to solve this without creating a new column (since I have to delete it later). I generally prefer not to alter the dataframe if I can help it.
Example:
import pandas as pd
# sample dataframe:
cid=[1,2,3,4,5,6,7,8,9,10]
strings=[
'tncduuqcr',
'xqjfykalt',
'arzouazgz',
'tncknojbi',
'xqjgfcekh',
'arzupnzrx',
'tncfjxyox',
'xqjeboxdn',
'arzphbdcs',
'tnctnfoyi',
]
df=pd.DataFrame(list(zip(cid,strings)),columns=['cid','strings'])
# This is the step I would like to avoid doing:
df['short_strings']=df['strings'].str[0:3]
out_dict={}
for x in df['short_strings'].unique():
df2=df[df['short_strings']==x]
out_dict[x]=df2
# the separate dataframes:
for x in out_dict.keys():
print(out_dict[x])
Output:
cid strings short_strings
0 1 tncduuqcr tnc
3 4 tncknojbi tnc
6 7 tncfjxyox tnc
9 10 tnctnfoyi tnc
cid strings short_strings
1 2 xqjfykalt xqj
4 5 xqjgfcekh xqj
7 8 xqjeboxdn xqj
cid strings short_strings
2 3 arzouazgz arz
5 6 arzupnzrx arz
8 9 arzphbdcs arz
I have tried simply comparing ==df['strings'].str[0:3] but this does not seem to work.