1

From a daily report, I use:

pd.read_csv(filepath, sep = '\t')

to open a dataframe looking like the below (in simplified format):

finalDf2 = pd.DataFrame(dict(
            Portfolio = pd.Series(['Book1', 'Book1', 'Book2', 'Book3', 'Book1','Book1']), 
            Strike = pd.Series(['108','109.10', '111', '114', '108.3', '115.0']), 
            Notional = pd.Series(['0', '-0.02', '35', '. 3K', '-0.05K', '0' ]))
     )

By running the below on various entries under the "Notional" column:

type(finalDf2.iloc[ , ]

I see the 0s are of type int already.
The nonzero values however are strings. I tried to convert strings to floats by using:

finalDf2['Notional'].astype(float)

but before doing so, how could I convert all cells containing "K" values? For instance,

. 3K should end up being float or int 30
-0. 05K should end up being float or int -50

Spacings are actually in the file and thus dataframe unfortunately.

2
  • Does an extra space in decimal values represent a zero? So does ". 3" stand for ".03" ? Commented Jul 5, 2017 at 10:18
  • . 3K should end up being float or int 30; -0. 05K should end up being float or int -50 These 2 lines contradict eachother. should the space be converted to '0' or to '' Commented Jul 5, 2017 at 10:29

2 Answers 2

1

Here is a possible solution:

def notional_to_num(x):
    if isinstance(x, (int, float)):
        return x
    elif isinstance(x, str):
        return x if 'K' not in x else float(x.replace(" ", "0")[:-1])*1e3
    else:
        raise

finalDf2.loc[:, 'Notional'] = finalDf2['Notional'].apply(notional_to_num)

Which gives the following output:

  Notional Portfolio  Strike
0        0     Book1     108
1    -0.02     Book1  109.10
2       35     Book2     111
3       30     Book3     114
4      -50     Book1   108.3
5        0     Book1   115.0
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you FLab. I used a mix of John's answer above by first using: finalDf2['colName1']= finalDf2['colName1'].str.replace(' ', '0') ... finalDf2['colName9'] = finalDf9['colName9'].str.replace(' ', '0') to various columns. This is to get rid of any spaces after the periods . Then used: finalDf.fillna(value =0, inplace = True) to convert NaN to 0s. Lastly slightly modified your function to handle another exception (my actual dataframe has some entries that are randomly .** ) and applied it to various columns for a much cleaner df.
Glad it helped! Don't forget to up vote/accept if you found the answer useful, or upload your answer so it can be useful to others
0

First, replace spaces.

In [344]: s = finalDf2['Notional'].str.replace(' ', '0')

Then, extract numerical part, and 'K' part, replacing K with 1000.

In [345]: (s.str.extract(r'(-?[\d\.]+)', expand=False).astype(float) *
           s.str.extract(r'([K]+)', expand=False).replace([np.nan, 'K'], [1, 1000]) )
Out[345]:
0     0.00
1    -0.02
2    35.00
3    30.00
4   -50.00
5     0.00
Name: Notional, dtype: float64

1 Comment

Thank you John. I used the first part. The second part yielded an error on my actual dataframe because of the periods i think: "could not convert string to float: '.' "

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.