Remove value after specific character in pandas dataframe

Question

I am using a pandas dataframe and I would like to remove all information after a space occures. My dataframe is similar as this one:

import pandas as pd
d = {'Australia' : pd.Series([0,'1980 (F)\n\n1957 (T)\n\n',1991], index=['Australia', 'Belgium', 'France']),
     'Belgium' : pd.Series([1980,0,1992], index=['Australia','Belgium', 'France']),
    'France' : pd.Series([1991,1992,0], index=['Australia','Belgium', 'France'])}
df = pd.DataFrame(d, dtype='str')

df

I am able to remove the values for one specific column, however the split() function does not apply to the whole dataframe.

f = lambda x: x["Australia"].split(" ")[0]
df = df.apply(f, axis=1)

Anyone an idea how I could remove the information after a space occures for each value in the dataframe?

Yes, I have seen a similar question. But I want to return my whole dataframe without the information after the space. — Tox
– Tox, Commented Mar 27, 2018 at 12:38

jezrael · Accepted Answer · 2018-03-27 12:45:22Z

1

I think need convert all columns to strings and then apply split function:

df = df.astype(str).apply(lambda x: x.str.split().str[0])

Another solution:

df = df.astype(str).applymap(lambda x: x.split()[0])

print (df)
          Australia Belgium France
Australia         0    1980   1991
Belgium        1980       0   1992
France         1991    1992      0

answered Mar 27, 2018 at 12:45

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Scott Boston · Accepted Answer · 2018-03-27 12:48:25Z

1

Let's try using assign since the column names in this dataframe are "well tame" meaning not containing a space nor special characters:

df.assign(Australia=df.Australia.str.split().str[0])

Output:

          Australia Belgium France
Australia         0    1980   1991
Belgium        1980       0   1992
France         1991    1992      0

Or you can use apply and a lamda function if all your column datatypes are strings:

df.apply(lambda x: x.str.split().str[0])

Or if you have a mixture of numbers and string dtypes then you can use select_dtypes with assign like this:

df.assign(**df.select_dtypes(exclude=np.number).apply(lambda x: x.str.split().str[0]))

edited Mar 27, 2018 at 12:48

answered Mar 27, 2018 at 12:42

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Comments

Tox · Accepted Answer · 2018-03-27 12:42:01Z

0

You could loop over all columns and apply below:

for column in df:

    df[column] = df[column].str.split().str[0]

answered Mar 27, 2018 at 12:42

Tox

8543 gold badges14 silver badges36 bronze badges

Collectives™ on Stack Overflow

Remove value after specific character in pandas dataframe

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related