0

I'd like to create a new column in a pandas dataframe from the results produced from a regular expression.

The result I'm expecting is:

In[1]: df
Out[1]: 

    valueProduct    valueService      totValue
0     $465580.99      $322532.34    $788113.33

My dataframe dtypes are:

df.dtypes

Contracting Office Name               object
Contracting Office Region             object
PIID                                  object
PIID Agency ID                        object
Major Program                         object
Description of Requirement            object
Referenced  IDV PIID                  object
Completion Date               datetime64[ns]
Prepared By                           object
Funding Office Name                   object
Funding Agency ID                     object
Funding Agency Name                   object
Funding Office ID                     object
Effective Date                datetime64[ns]
Fiscal Year                            int64
Ultimate Contract Value              float64
Count                                  int64

The column titled "Description of Requirements" in row 1 has a long string value of the following (similar string values in this column through out the dataset):

STEWARDSHIP ADD ADDITIONAL VOLUME AND ROAD WORK CHANGES SILVER SLIDE STEWARDSHIP PROJECT - ALLEGHENY NATIONAL FOREST VALUE OF PRODUCT = $465580.99 VALUE OF SERVICE = $322532.34 TOTAL VALUE OF CONTRACT = $788113.33

I want to successfully write a regex to extract 3 items from this string but only produce the dollar value in new columns:

VALUE OF PRODUCT = $465580.99
VALUE OF SERVICE = $322532.34
TOTAL VALUE OF CONTRACT = $788113.33

Here's the code to do this assuming the string in the dataframe were a simple string value outside of a dataframe:

text = "STEWARDSHIP ADD ADDITIONAL VOLUME AND ROAD WORK CHANGES SILVER SLIDE STEWARDSHIP PROJECT - ALLEGHENY NATIONAL FOREST VALUE OF PRODUCT = $465580.99 VALUE OF SERVICE = $322532.34 TOTAL VALUE OF CONTRACT = $788113.33"


pattern = re.compile('(VALUE OF PRODUCT).{1,3}\$\d*\.\d*', re.IGNORECASE)
getPattern = re.search(pattern, text)
print (getPattern.group())

Which would produce:

VALUE OF PRODUCT = $465580.99

I can repeat this action for the other two items.

Now, sense I'm working in a dataframe I tried to do something like the following:

def valProduct(row):
    pattern = re.compile('(VALUE OF PRODUCT).{1,3}\$\d*\.\d*', re.IGNORECASE)
    findPattern = re.search(pattern, row['Description of Requirement'])
    return findPatter

df['valueProduct'] = df.apply(lambda row: valProduct(row), axis=1)

In[2]: sf[['valueProduct']][:1]
Out[2]:  None

This produces a new column but its empty, but should show at the very least:

VALUE OF PRODUCT = $465580.99

Any help is greatly appreciated!

1 Answer 1

1
import re    

text = "STEWARDSHIP ADD ADDITIONAL VOLUME AND ROAD WORK CHANGES SILVER SLIDE STEWARDSHIP PROJECT - ALLEGHENY NATIONAL FOREST VALUE OF PRODUCT = $465580.99 VALUE OF SERVICE = $322532.34 TOTAL VALUE OF CONTRACT = $788113.33"

re.findall(r'value.+?\d\b',text, re.I)

Output

['VALUE OF PRODUCT = $465580', 'VALUE OF SERVICE = $322532', 'VALUE OF CONTRACT = $788113']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.