I want to create pandas dataframe from list of urls where I want to split each url by hierarchy and create new columns for it. More specifically, I want to break up url by domain, protocol, query, fragment, paths. I think it's doable by using pandas, and I learned this solution but didn't get expected one.
example data snippet
Here is example data snippet in csv file and here is my attempt to do this:
import pandas as pd
df=pd.read_csv('example data snippet.csv')
df['protocol'],df['domain'],df['path'],df['query'],df['fragment'] = zip(*df['url'].map(urlparse.urlsplit))
above attempt wasn't successful because it's ouput doesn't meet with my expectation, so I am wondering is there better way to make this happen with pandas. Can anyone point me out how to make this work? Anyway to get this done easily? Any idea?
desired output
I want to split url and create new column for each component, the columns of my final pandas dataframe would be like this:
df.columns=['id', 'title', 'news source', 'topic', 'news category']
for example, in this url, I could say:
'variety.com/2017/biz/news/tax-march-donald-trump-protest-1202031487/'
'variety.com/2018/film/news/list-2018-oscar-nominations-1202668757/
news source =['variety.com','variety.com']
topic = ['tax-march-donald-trump-protest','list-2018-oscar-nominations']
new category = ['biz', 'film']
how can I do this kind of parsing for given urls list and add them into new column in pandas dataframe? anyway to get this done? thanks in advance