4

Ok. I give up.

I have a DataFrame with a column ("Amount") of large numbers:

Amount
-1 000 000,00
 4 848 903,00
-2 949 234,00
13 038 023,00
 7 985 232,00
 ....

I want to convert these to numbers that I can calculate with.

Let's investigate:

>type(b["Amount"][0])
str

Ok, it's a string.

>float("-1 000 000,00".replace(' ', '').replace(',','.'))
-1000000.00

Ok, works great!

To make a lambda thingy (to process all elements in column), I need it in a function:

def make_float(num):
    num = num.replace(' ','').replace(',','.')
    return float(num)


>make_float(b["Amount"][0])
ValueError: could not convert string to float: −1 000 000.00

What?!

>b["Amount"][0].replace(' ','').replace(',','.')
Out[258]:
'\xe2\x88\x921\xc2\xa0000\xc2\xa0000.00'

Oh no!! Unicode hell! I give up.

Does Python have an easy function/method that will convert my numbers (including negative) to something I can calculate with?

6
  • I even tried a function that goes like this: def make_float(num): num = num.replace(',','.') num = num.replace(' ','') num = num.replace('\U00002013', '-') num = num.replace(u'\N{MINUS SIGN}', '-') num = num.decode('unicode_escape').encode('ascii','ignore') num = float(num) return num Commented Jan 30, 2018 at 12:24
  • What does print(b["Amount"][0]) prints out? And in the make_float functions, can you add a print(num) after you set the num variable and see what it prints out? Commented Jan 30, 2018 at 12:32
  • The function you wrote is works fine. The problem is with b["Amount"][0] I guess Commented Jan 30, 2018 at 12:34
  • I'm reading this file with pandas.read_csv. Changing the encoding might or might not help? Commented Jan 30, 2018 at 13:05
  • The value of print(b["Amount"][0]) is -1 000 000,00. Commented Jan 30, 2018 at 13:08

5 Answers 5

2

looks like you have a problem with the minus('-') symbol in the string.

Try:

def make_float(num):
    num = num.replace(' ','').replace(',','.').replace("−", "-")
    return float(num)
Sign up to request clarification or add additional context in comments.

Comments

1

This should solve your issue. The problem is to get the first value of the column as a value you should use pd.Series.values[0].

import pandas as pd

s = pd.Series(['-1 000 000,00'])

def make_float(num):
    num = num.replace(' ','').replace(',','.')
    return float(num)

s.map(make_float)

# 0   -1000000.0
# dtype: float64

make_float(s.values[0])
# -1000000.0

Comments

1

What if you try to encode it?

def make_float(num):
    num = num.encode('latin-1').replace(' ','').replace(',','.')
    return float(num)

Comments

1

Your data contains unicode minus sign (one of several minus signs in unicode) and non breaking space (one of several space characters in unicode)

You can use str.translate() to convert characters to a format that can be correctly parsed by float().

def make_float(num):
     return float(num.translate({0x2c: '.', 0xa0: None, 0x2212: '-'}))

make_float('−1\xa0000\xa0000,00')

Comments

0

Ok. This seemed to do the trick. It's a solution in 3 steps.

  1. I checked my dataset with chardetect data.csv It said 'utf8' with a confidence 0.99

  2. I made sure to pass that into my pandas.read_csv: pandas.read_csv(data....., encoding = 'utf8')

  3. I made a function,

def make_float(num):
    num = num.replace(u'\N{MINUS SIGN}', '-')  # encode the minus sign
    num = num.replace(',','.') #change the decimal separator from comma to dot
    num = num.replace(unichr(160), '') # encode the (non-breaking) space to ''
    num = float(num)
    return num

I then passed this function to the pandas.read_csv thing with `pandas.read_csv(data...., encoding='utf8', converters={'Amount':make_float}) `

Working good so far.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.