pandas regular expressions in functions

Question

I'd like to create a new column in a pandas dataframe from the results produced from a regular expression.

The result I'm expecting is:

In[1]: df
Out[1]: 

    valueProduct    valueService      totValue
0     $465580.99      $322532.34    $788113.33

My dataframe dtypes are:

df.dtypes

Contracting Office Name               object
Contracting Office Region             object
PIID                                  object
PIID Agency ID                        object
Major Program                         object
Description of Requirement            object
Referenced  IDV PIID                  object
Completion Date               datetime64[ns]
Prepared By                           object
Funding Office Name                   object
Funding Agency ID                     object
Funding Agency Name                   object
Funding Office ID                     object
Effective Date                datetime64[ns]
Fiscal Year                            int64
Ultimate Contract Value              float64
Count                                  int64

The column titled "Description of Requirements" in row 1 has a long string value of the following (similar string values in this column through out the dataset):

STEWARDSHIP ADD ADDITIONAL VOLUME AND ROAD WORK CHANGES SILVER SLIDE STEWARDSHIP PROJECT - ALLEGHENY NATIONAL FOREST VALUE OF PRODUCT = $465580.99 VALUE OF SERVICE = $322532.34 TOTAL VALUE OF CONTRACT = $788113.33

I want to successfully write a regex to extract 3 items from this string but only produce the dollar value in new columns:

VALUE OF PRODUCT = $465580.99
VALUE OF SERVICE = $322532.34
TOTAL VALUE OF CONTRACT = $788113.33

Here's the code to do this assuming the string in the dataframe were a simple string value outside of a dataframe:

text = "STEWARDSHIP ADD ADDITIONAL VOLUME AND ROAD WORK CHANGES SILVER SLIDE STEWARDSHIP PROJECT - ALLEGHENY NATIONAL FOREST VALUE OF PRODUCT = $465580.99 VALUE OF SERVICE = $322532.34 TOTAL VALUE OF CONTRACT = $788113.33"


pattern = re.compile('(VALUE OF PRODUCT).{1,3}\$\d*\.\d*', re.IGNORECASE)
getPattern = re.search(pattern, text)
print (getPattern.group())

Which would produce:

VALUE OF PRODUCT = $465580.99

I can repeat this action for the other two items.

Now, sense I'm working in a dataframe I tried to do something like the following:

def valProduct(row):
    pattern = re.compile('(VALUE OF PRODUCT).{1,3}\$\d*\.\d*', re.IGNORECASE)
    findPattern = re.search(pattern, row['Description of Requirement'])
    return findPatter

df['valueProduct'] = df.apply(lambda row: valProduct(row), axis=1)

In[2]: sf[['valueProduct']][:1]
Out[2]:  None

This produces a new column but its empty, but should show at the very least:

VALUE OF PRODUCT = $465580.99

Any help is greatly appreciated!

Charlie G · Accepted Answer · 2017-03-16 05:54:35Z

1

import re    

text = "STEWARDSHIP ADD ADDITIONAL VOLUME AND ROAD WORK CHANGES SILVER SLIDE STEWARDSHIP PROJECT - ALLEGHENY NATIONAL FOREST VALUE OF PRODUCT = $465580.99 VALUE OF SERVICE = $322532.34 TOTAL VALUE OF CONTRACT = $788113.33"

re.findall(r'value.+?\d\b',text, re.I)

Output

['VALUE OF PRODUCT = $465580', 'VALUE OF SERVICE = $322532', 'VALUE OF CONTRACT = $788113']

edited Mar 16, 2017 at 5:54

Charlie G

5545 silver badges16 bronze badges

answered Mar 16, 2017 at 3:37

LetzerWille

5,6965 gold badges26 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pandas regular expressions in functions

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related