MySQL and Python: Convert string value to int, to float or what?

Question

I have a string that contains salary information in the following way:

salaryMixed = "£25,000 - £30,000"

Sometimes it will look like this:

salaryMixed = "EUR25,000 - EUR30,000"

And others times like this:

salaryMixed = "£37.50 - £50.00"

What I want to do is to remove all characters but the numeric values and then split the two values so as to place them into their own respective variables that reflect low banding and high banding. So far I have:

if salaryMixed.find('£')!=-1: # found £ char
    salaryMixed = salaryMixed.replace("£", "")
if salaryMixed.find('-')!=-1: # found hyphen
    salaryMixed = salaryMixed.replace("-", "")
if salaryMixed.find(',')!=-1: # found comma
    salaryMixed = salaryMixed.replace(",", "")
if salaryMixed.find('EUR')!=-1: # found EUR
    salaryMixed = salaryMixed.replace("EUR", "")
salaryMixed = re.sub('\s{2,}', ' ', salaryMixed) # to remove multiple space

if len(salaryList) == 1:
    salaryLow = map(int, 0) in salaryList
    salaryHigh = 00000
else:
    salaryLow = int(salaryList.index(1))
    salaryHigh = int(salaryList.index(2))

But I am stumped with how to split the two values up, and also how to handle the decimal point when salaryMixed isn't an annual salary but rather per hour as in the case of salaryMixed = "£37.50 - £50.00" because isn't that a float?

I am wanting to store this information in a MySQL DB later on in the code but I have described the table as:

CREATE TABLE jobs(
   job_id INT NOT NULL AUTO_INCREMENT,
   job_title VARCHAR(300) NOT NULL,
   job_salary_low INT(25),
   job_salary_high INT(25),
   PRIMARY KEY ( job_id )
);

What is the best approach here? Thanks.

Why not search for the regex [\d,.]+? That will tell you where both numbers are in the string. Then you can preprocess (remove commas etc) and transform it into a number. — Patashu
– Patashu, Commented Jun 12, 2013 at 3:58
Just a small point, your table schema has INT for the salary columns, but your values are floats. Better switch to DECIMAL at the MySQL end. — Burhan Khalid
– Burhan Khalid, Commented Jun 12, 2013 at 4:19

theodox · Accepted Answer · 2013-06-12 04:35:39Z

1

This is a good case for a regular expression from the python re module. And you'll probably want to upcast the hourly rates to annual (assuming you have a consistent average hourly

import re

def salary_band(val):
    currency = 'EUR' if 'EUR' in val else 'GBP'
    numbers = re.findall("[0-9.\,]*", val) # this will have a bunch of empty entries and two numbers
    numbers = [i.replace(",","") for i in numbers if i] # filter out empty strings, remove commas
    numbers = map(float, numbers) # convert to floats
    annual = lambda p: int(p) if p > 2000 else int( p * 1800) # your number here...
    return currency, map(annual, numbers)

print salary_band ( "gbp37.50 - gbp50.00")
print salary_band ( "EUR25,000 - EUR30,000")
>> ('GBP', [75000, 100000])
>> ('EUR', [25000, 30000])

Here i'm returning the currency type and the high/low numbers as a tuple - you can unpack it easily into your table

answered Jun 12, 2013 at 4:35

theodox

12.2k3 gold badges25 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

tlre0952b Over a year ago

Can you explain the line annual = lambda p: int(p) if p > 2000 else int( p * 1800) to me?

tlre0952b Over a year ago

Also, isn't the 2nd item of the tuple (the actual numbers) a list? So a list within a tuple? How do I "unpack" these numbers into respective variables? So far I have:

salaryTuple = salary_band(salaryMixed)                                            salaryLow = salaryTuple[0]                                            salaryHigh = salaryTuple[1]

Which isn't correct...

theodox Over a year ago

The lambda is an anonymous function. If the number is larger than 2000 just return it, otherwise return it times 1800 (which is a ballpark number for hours-worked-per-year) It really should be the same number in both cases, I changed it in one place and forgot to change it in the other

John La Rooy · Accepted Answer · 2013-06-12 04:13:47Z

What I want to do is to remove all characters but the numeric values and then split the two values so as to place them into their own respective variables that reflect low banding and high banding. So far I have:

Ok taking this one step at a time. Remove all the characters but the numeric values (Better keep spaces and periods too)

>>> testcases =  ["£25,000 - £30,000", "EUR25,000 - EUR30,000", "£37.50 - £50.00"]
>>> res = [''.join(x for x in tc if x.isdigit() or x.isspace() or x == '.') for tc in testcases]
>>> res
['25000  30000', '25000  30000', '37.50  50.00']

ok, now split them

>>> res = [x.split() for x in res]
>>> res
[['25000', '30000'], ['25000', '30000'], ['37.50', '50.00']]

Convert to floats (Decimal might be better)

>>> res = [[float(j) for j in i] for i in res]>>> res
[[25000.0, 30000.0], [25000.0, 30000.0], [37.5, 50.0]]

Put in separate variables

>>> for low, high in res:
...     print (low, high)
... 
25000.0 30000.0
25000.0 30000.0
37.5 50.0

regex as suggested by @Patashu is the easy/lazy way to do it though

Kapila Clan · Accepted Answer · 2013-06-12 04:23:50Z

0

for storing the values in db, you can use MySQLdb library in python.It's easy to use and will store al your data to database. Here check it out.

You can install it by apt-get install python-mysqldb

answered Jun 12, 2013 at 4:23

Kapila Clan

851 gold badge3 silver badges10 bronze badges

1 Comment

tlre0952b Over a year ago

Thanks, though I forgot to mention but I already have that import.

Collectives™ on Stack Overflow

MySQL and Python: Convert string value to int, to float or what?

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related