Pandas: converting strings representing numbers, with characters, into float/ int

Question

From a daily report, I use:

pd.read_csv(filepath, sep = '\t')

to open a dataframe looking like the below (in simplified format):

finalDf2 = pd.DataFrame(dict(
            Portfolio = pd.Series(['Book1', 'Book1', 'Book2', 'Book3', 'Book1','Book1']), 
            Strike = pd.Series(['108','109.10', '111', '114', '108.3', '115.0']), 
            Notional = pd.Series(['0', '-0.02', '35', '. 3K', '-0.05K', '0' ]))
     )

By running the below on various entries under the "Notional" column:

type(finalDf2.iloc[ , ]

I see the 0s are of type int already.
The nonzero values however are strings. I tried to convert strings to floats by using:

finalDf2['Notional'].astype(float)

but before doing so, how could I convert all cells containing "K" values? For instance,

. 3K should end up being float or int 30
-0. 05K should end up being float or int -50

Spacings are actually in the file and thus dataframe unfortunately.

Does an extra space in decimal values represent a zero? So does ". 3" stand for ".03" ? — FLab
– FLab, Commented Jul 5, 2017 at 10:18
. 3K should end up being float or int 30; -0. 05K should end up being float or int -50 These 2 lines contradict eachother. should the space be converted to '0' or to '' — Maarten Fabré
– Maarten Fabré, Commented Jul 5, 2017 at 10:29

FLab · Accepted Answer · 2017-07-05 10:16:23Z

1

Here is a possible solution:

def notional_to_num(x):
    if isinstance(x, (int, float)):
        return x
    elif isinstance(x, str):
        return x if 'K' not in x else float(x.replace(" ", "0")[:-1])*1e3
    else:
        raise

finalDf2.loc[:, 'Notional'] = finalDf2['Notional'].apply(notional_to_num)

Which gives the following output:

  Notional Portfolio  Strike
0        0     Book1     108
1    -0.02     Book1  109.10
2       35     Book2     111
3       30     Book3     114
4      -50     Book1   108.3
5        0     Book1   115.0

answered Jul 5, 2017 at 10:16

FLab

7,5465 gold badges40 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Fed Over a year ago

Thank you FLab. I used a mix of John's answer above by first using: finalDf2['colName1']= finalDf2['colName1'].str.replace(' ', '0') ... finalDf2['colName9'] = finalDf9['colName9'].str.replace(' ', '0') to various columns. This is to get rid of any spaces after the periods . Then used: finalDf.fillna(value =0, inplace = True) to convert NaN to 0s. Lastly slightly modified your function to handle another exception (my actual dataframe has some entries that are randomly .** ) and applied it to various columns for a much cleaner df.

FLab Over a year ago

Glad it helped! Don't forget to up vote/accept if you found the answer useful, or upload your answer so it can be useful to others

Zero · Accepted Answer · 2017-07-05 10:21:21Z

0

First, replace spaces.

In [344]: s = finalDf2['Notional'].str.replace(' ', '0')

Then, extract numerical part, and 'K' part, replacing K with 1000.

In [345]: (s.str.extract(r'(-?[\d\.]+)', expand=False).astype(float) *
           s.str.extract(r'([K]+)', expand=False).replace([np.nan, 'K'], [1, 1000]) )
Out[345]:
0     0.00
1    -0.02
2    35.00
3    30.00
4   -50.00
5     0.00
Name: Notional, dtype: float64

answered Jul 5, 2017 at 10:21

Zero

77.4k22 gold badges153 silver badges153 bronze badges

1 Comment

Fed Over a year ago

Thank you John. I used the first part. The second part yielded an error on my actual dataframe because of the periods i think: "could not convert string to float: '.' "

Collectives™ on Stack Overflow

Pandas: converting strings representing numbers, with characters, into float/ int

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related