I'd like to create a new column in a pandas dataframe from the results produced from a regular expression.
The result I'm expecting is:
In[1]: df
Out[1]:
valueProduct valueService totValue
0 $465580.99 $322532.34 $788113.33
My dataframe dtypes are:
df.dtypes
Contracting Office Name object
Contracting Office Region object
PIID object
PIID Agency ID object
Major Program object
Description of Requirement object
Referenced IDV PIID object
Completion Date datetime64[ns]
Prepared By object
Funding Office Name object
Funding Agency ID object
Funding Agency Name object
Funding Office ID object
Effective Date datetime64[ns]
Fiscal Year int64
Ultimate Contract Value float64
Count int64
The column titled "Description of Requirements" in row 1 has a long string value of the following (similar string values in this column through out the dataset):
STEWARDSHIP ADD ADDITIONAL VOLUME AND ROAD WORK CHANGES SILVER SLIDE STEWARDSHIP PROJECT - ALLEGHENY NATIONAL FOREST VALUE OF PRODUCT = $465580.99 VALUE OF SERVICE = $322532.34 TOTAL VALUE OF CONTRACT = $788113.33
I want to successfully write a regex to extract 3 items from this string but only produce the dollar value in new columns:
VALUE OF PRODUCT = $465580.99
VALUE OF SERVICE = $322532.34
TOTAL VALUE OF CONTRACT = $788113.33
Here's the code to do this assuming the string in the dataframe were a simple string value outside of a dataframe:
text = "STEWARDSHIP ADD ADDITIONAL VOLUME AND ROAD WORK CHANGES SILVER SLIDE STEWARDSHIP PROJECT - ALLEGHENY NATIONAL FOREST VALUE OF PRODUCT = $465580.99 VALUE OF SERVICE = $322532.34 TOTAL VALUE OF CONTRACT = $788113.33"
pattern = re.compile('(VALUE OF PRODUCT).{1,3}\$\d*\.\d*', re.IGNORECASE)
getPattern = re.search(pattern, text)
print (getPattern.group())
Which would produce:
VALUE OF PRODUCT = $465580.99
I can repeat this action for the other two items.
Now, sense I'm working in a dataframe I tried to do something like the following:
def valProduct(row):
pattern = re.compile('(VALUE OF PRODUCT).{1,3}\$\d*\.\d*', re.IGNORECASE)
findPattern = re.search(pattern, row['Description of Requirement'])
return findPatter
df['valueProduct'] = df.apply(lambda row: valProduct(row), axis=1)
In[2]: sf[['valueProduct']][:1]
Out[2]: None
This produces a new column but its empty, but should show at the very least:
VALUE OF PRODUCT = $465580.99
Any help is greatly appreciated!