0

I have a dataframe df

ID  active_seconds  domain  subdomain   search_engine   search_term
0120bc30e78ba5582617a9f3d6dfd8ca    35  city-link.com  msk.city-link.com  None    None
0120bc30e78ba5582617a9f3d6dfd8ca    54  vk.com  vk.com  None    None
0120bc30e78ba5582617a9f3d6dfd8ca    34  mts.ru  shop.mts.ru  None    None
16c28c057720ab9fbbb5ee53357eadb7    4   facebook.com    facebook.com    None    None

and have a list url = ['city-link.com', 'shop.mts.ru']. I need to change column with subdomain. If subdomain is equal one of elem from url, leave it. If subdomain != elem from url and domain == elem from url I should rewrite subdomain(write domain to it). And if subdomain no in list no change. How can I do it with pandas? I try to do it with loop but it spent a lot of time

domains = df['domain']
subdomains = df['subdomain']
urls = ['yandex.ru', 'vk.com', 'mail.ru']
for (domain, subdomain) in zip(domains, subdomains):
    if subdomain in urls:
        continue
    elif domain in urls and subdomain not in urls:
        df['subdomain'].replace(subdomain, domain, inplace=True)

1 Answer 1

2

First, you need to get records where domain field in urls list:

domains_in_urls = df[df.domain.isin(urls)]

Next, you have to take these records and find out records where subdomain field are not in urls:

subdomains_not_in_urls = domains_in_urls[~domains_in_urls.subdomain.isin(urls)]

And replace subdomain field with the domain field for those indexes in original dataframe:

df.loc[subdomains_not_in_urls.index, 'subdomain'] = \
        df.loc[subdomains_not_in_urls.index, 'domain']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.