0

I have data like this. What I am trying to do is to create a rule, based on domain names for my project. I want to create a new column named new_url based on domains. If it contains .cdn. it will take the string before .cdn. , otherwise it will call url parser library and parse url in another way. The problem is that in the csv file I created (cleanurl.csv) , there is no new_url column created. When I print parsed urls in code, I can see them. If and else condition are working. Could you help me please ?

enter image description here

import pandas as pd 
import url_parser
from url_parser import parse_url,get_url,get_base_url
import numpy as np 

df = pd.read_csv("C:\\Users\\myuser\\Desktop\\raw_data.csv", sep=';')

i=-1
for x in df['domain']:

    i=i+1
    print("*",x,"*") 

    if '.cdn.' in x:
        parsed_url=x.split('.cdn')[0]
        print(parsed_url)
        df.iloc[i]['new_url']=parsed_url
       
    else:
        parsed_url=get_url(x).domain +'.' + get_url(x).top_domain
        print(parsed_url)
        df.iloc[i]['new_url']=parsed_url

df.to_csv("C:\\Users\\myuser\\Desktop\\cleanurl.csv", sep=';')
1

1 Answer 1

1

Use .loc[row, 'column'] to create new column

for idx, x in df['domain'].items():
    if '.cdn.' in x:
        df.loc[idx, 'new_url'] = parsed_url
    else:
        df.loc[idx, 'new_url'] = parsed_url
Sign up to request clarification or add additional context in comments.

1 Comment

thanks a lot, this approach solved the issue!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.