I have a column of data in pandas dataframe in Bxxxx-xx-xx-xx.y format. Only the first part (Bxxxx) is all I require. How do I split the data? In addition, I also have data in BSxxxx-xx-xx-xx format in the same column which I would like to remove using regex='^BS' command (For some reason, it's not working). Any help in this regard will be appreciated.BTW, I am using df.filter command.
4 Answers
Consider below example:
df = pd.DataFrame({
'col':['B123-34-gd-op','BS01010-9090-00s00','B000003-3frdef4-gdi-ortp','B1263423-304-gdcd-op','Bfoo3-poo-plld-opo', 'BSfewf-sfdsd-cvc']
})
print(df)
Output:
col
0 B123-34-gd-op
1 BS01010-9090-00s00
2 B000003-3frdef4-gdi-ortp
3 B1263423-304-gdcd-op
4 Bfoo3-poo-plld-opo
5 BSfewf-sfdsd-cvc
Now Let's do two tasks:
- Extract Bxxxx part from Bxxx-xx-xx-xxx .
- Remove BSxxx formated strings.
Consider below code which uses startswith():
df[~df.col.str.startswith('BS')].col.str.split('-').str[0]
Output:
0 B123
2 B000003
3 B1263423
4 Bfoo3
Name: col, dtype: object
Breakdown:
df[~df.col.str.startswith('BS')] gives us all the string which do not start with BS. Next, We are spliting those string with - and taking the first part with .col.str.split('-').str[0] .