I want to extract dates from a pandas dataframe column of URLs. Here is my code:
import dateutil.parser as dparser
import pandas as pd
df_results["URL"] = df_results["URL"].astype("str") # String conversion
URLs = df_results["URL"].tolist() # List creation
for URL in URLs: # Loop through list
date = dparser.parse(URL,fuzzy=True) # Parse date
print date # Print date
However, I receive a ValueError: Unknown string format:
ValueError Traceback (most recent call last)
<ipython-input-23-fd55da2e8e1e> in <module>()
69
70
---> 71 df_results = parse_URL(df_final) # parse 2
72
73 print df_results.head()
<ipython-input-23-fd55da2e8e1e> in parse_URL(df_final)
51 URLs = df_results["URL"].tolist()
52 for URL in URLs:
---> 53 test = dparser.parse(URL,fuzzy=True)
54 print test
"_")
C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(timestr, parserinfo, **kwargs)
1180 return parser(parserinfo).parse(timestr, **kwargs)
1181 else:
-> 1182 return DEFAULTPARSER.parse(timestr, **kwargs)
1183
1184
C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
557
558 if res is None:
--> 559 raise ValueError("Unknown string format")
560
561 if len(res) == 0:
ValueError: Unknown string format
I assume that the URLs are stored as some sort of hyperlink. However, df.info() shows an object dtype for URL.
Q1: How to covert a pandas column of URLs to raw string dtype?
Q2: How to extract dates from a pandas dataframe column of URLs and save them to a new column?