Pandas dataframe row data filtering

Question

I have a column of data in pandas dataframe in Bxxxx-xx-xx-xx.y format. Only the first part (Bxxxx) is all I require. How do I split the data? In addition, I also have data in BSxxxx-xx-xx-xx format in the same column which I would like to remove using regex='^BS' command (For some reason, it's not working). Any help in this regard will be appreciated.BTW, I am using df.filter command.

Consider making a minimal reproducible example . It is a little unclear of what output do you expect from your description. — harpan
– harpan, Commented Jul 19, 2018 at 19:15

Rohith · Accepted Answer · 2018-07-19 19:45:56Z

1

This should work.

df[df.col1.apply(lambda x: x.split("-")[0][0:2]!="BS")].col1.apply(lambda x: x.split("-")[0])

answered Jul 19, 2018 at 19:45

Rohith

1,0483 gold badges8 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

harpan · Accepted Answer · 2018-07-19 19:42:55Z

Consider below example:

df = pd.DataFrame({
    'col':['B123-34-gd-op','BS01010-9090-00s00','B000003-3frdef4-gdi-ortp','B1263423-304-gdcd-op','Bfoo3-poo-plld-opo', 'BSfewf-sfdsd-cvc']
})
print(df)

Output:

    col
0   B123-34-gd-op
1   BS01010-9090-00s00
2   B000003-3frdef4-gdi-ortp
3   B1263423-304-gdcd-op
4   Bfoo3-poo-plld-opo
5   BSfewf-sfdsd-cvc

Now Let's do two tasks:

Extract Bxxxx part from Bxxx-xx-xx-xxx .
Remove BSxxx formated strings.

Consider below code which uses startswith():

df[~df.col.str.startswith('BS')].col.str.split('-').str[0]

Output:

0        B123
2     B000003
3    B1263423
4       Bfoo3
Name: col, dtype: object

Breakdown:

df[~df.col.str.startswith('BS')] gives us all the string which do not start with BS. Next, We are spliting those string with - and taking the first part with .col.str.split('-').str[0] .

drew_psy · Accepted Answer · 2018-07-19 19:23:46Z

0

You can define a function where in you treat Bxxxx-xx-xx-xx.y as a string and just extract the first 5 indexes.

    >>> def edit_entry(x):
    ...     return (str(x)[:5])
    >>> df['Column_name'].apply(edit_entry)

answered Jul 19, 2018 at 19:23

drew_psy

1059 bronze badges

Comments

Yilun Zhang · Accepted Answer · 2018-07-19 19:30:07Z

0

A one-liner solution would be:

df["column_name"] = df["column_name"].apply(lambda x: x[:5])

answered Jul 19, 2018 at 19:30

Yilun Zhang

9,0685 gold badges35 silver badges68 bronze badges

Collectives™ on Stack Overflow

Pandas dataframe row data filtering

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related