0

So I have two different Dataframes, both dataframes have columns known as "Domains" and show domains from different sources. The domains are clean and look like www.google.com, www.facebook.com, and not like www.facebook.com/4938249084.

The goal I am aiming to achieve is to know if any of the domains in DF1 is in the domains of DF2 and if is the in DF2, Append DF1 domain to the list. Here is the code I have written up.

Just so you guys have an Idea, DF1 has ~4,000 records v DF2 has ~7,000 records.

DF1

                 Domains
0                              NaN
1    www.hawaiiantimeathletics.com
2                              NaN
3              www.beach-elite.com
4                              NaN
5           www.dreamingoldvbc.com
6           www.pacificunionvb.com
7           www.alabamajuniors.com
8     www.birminghamvolleyball.com
9         www.magiccitythunder.com

DF2

                 Domains
0            www.labcsandiego.com
1         www.ahavavolleyball.com
2        www.northernelite-va.com
3               www.theedgevc.com
4                  divadallas.org
5            www.beach-elite.com
6         650xtremevolleyball.org
7             www.clubsouthvb.com
8    www.northidahovolleyball.com
9        wajvolleyball.site123.me

In this Example, The only domain that should be appended into the list is 'www.beach-elite.com'.

Here's the code I wrote

def match_domain(col1,col2):
    list1 = []
    for a in col1:
        v1 = a
    for b in col2:
        v2 = b 
        if v1 == v2:
            list1.append(v1)
            print(v1)
        elif v1 != v2:
            print('none')

match_domain(DF1, DF2)

Thank you in Advance !

0

2 Answers 2

1

You can use the isin function to determine this.

import pandas as pd
df1 = pd.DataFrame({'Domains': ['NaN', 'www.hawaiiantimeathletics.com', 'NaN', 'www.beach-elite.com', 'www.dreamingoldvbc.com', 'www.pacificunionvb.com', 'www.alabamajuniors.com', 'www.birminghamvolleyball.com', 'www.magiccitythunder.com']})
df2 = pd.DataFrame({'Domains': ['www.labcsandiego.com', 'www.ahavavolleyball.com', 'www.northernelite-va.com', 'www.theedgevc.com', 'divadallas.org', 'www.beach-elite.com', '650xtremevolleyball.org', 'www.clubsouthvb.com', 'www.northidahovolleyball.com', 'wajvolleyball.site123.me']})

df1.Domains[df1.Domains.isin(df2.Domains)].unique()

This should give you the intersection of the domains columns.

Sign up to request clarification or add additional context in comments.

Comments

0

This can be solved using list comprehension. Convert the two data frames to lists and lookup on list in the other as follow:-

df1:

Domains
NaN
www.hawaiiantimeathletics.com
NaN
www.beach-elite.com
NaN
www.dreamingoldvbc.com
www.pacificunionvb.com
www.alabamajuniors.com
www.birminghamvolleyball.com
www.magiccitythunder.com

df2:

Domains
www.labcsandiego.com
www.ahavavolleyball.com
www.northernelite-va.com
www.theedgevc.com
divadallas.org
www.beach-elite.com
650xtremevolleyball.org
www.clubsouthvb.com
www.northidahovolleyball.com
wajvolleyball.site123.me

Code:

list1 = [ domain for domain in df1['Domains'].tolist() if domain in df2['Domains'].tolist() ]

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.