How to read csv formatted numeric data into Pandas

Question

I have a csv file with two formatted columns that currently read in as objects:

contains percentage values which read in as strings like '0.01%'. The % is always at the end.
contains currency values which read in as string like '$1234.5'.

I have tried using the split function to remove the % or $ inside the dataframe, then using float on the result of the split. This will print the correct result but will not assign the value. It also gives a type error that float does not have split function, even though I do the split before the float????

Thanks to all who helped.

frogfanitw
– frogfanitw

2018-08-26 20:26:04 +00:00
Commented Aug 26, 2018 at 20:26 — frogfanitw
– frogfanitw, Commented Aug 26, 2018 at 20:26

fffrost · Accepted Answer · 2018-08-26 16:06:23Z

3

Try this:

import pandas as pd

df = pd.read_csv('data.csv')

"""
The example df looks like this:
    col1     col2
0  3.04%  $100.25
1  0.15%    $1250
2  0.22%     $322
3  1.30%     $956
4  0.49%     $621
"""

df['col1'] = df['col1'].str.split('%', expand=True)[[0]]
df['col2'] = df['col2'].str.split('$', 1, expand=True)[[1]]

df[['col1', 'col2']] = df[['col1', 'col2']].apply(pd.to_numeric)

edited Aug 26, 2018 at 16:06

answered Aug 26, 2018 at 16:00

fffrost

1,7873 gold badges27 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BlueSheepToken · Accepted Answer · 2018-08-26 16:26:37Z

1

You are probably looking for the apply method.

With

df['first_col'] = df['first_col'].apply(lambda x: float(x.strip('%'))

edited Aug 26, 2018 at 16:26

answered Aug 26, 2018 at 15:49

BlueSheepToken

6,1973 gold badges23 silver badges47 bronze badges

Collectives™ on Stack Overflow

How to read csv formatted numeric data into Pandas

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related