0

I have a dataset called data. Theres a column called networkDomain that looks like this, data['networkDomain']:

0                amazonaws.com
1               vodafone-ip.de
2             ask4internet.com
3                   actcorp.in
4                    (not set)
5                    (not set)
6                   druknet.bt
7              unknown.unknown
8         alliancebroadband.in
9                  vsnl.net.in
10          grandenetworks.net
11             superonline.net
12                   (not set)
13             unknown.unknown
14             unknown.unknown
15                  fidnet.com
16                   (not set)
17             telepacific.net
18                    pldt.net
19        networkbackup.com.au

I would like to filter all the values ending with '.com' or '.net' using regex and assign all other values as 0.

I've tried data['networkDomain'][data['networkDomain'].str.contains(".com$|.net$", regex=True)] which returns:

0                  amazonaws.com
2               ask4internet.com
10            grandenetworks.net
11               superonline.net
15                    fidnet.com
17               telepacific.net
18                      pldt.net
22                       tdc.net
24                     qwest.net
26                     hinet.net
27                     ztomy.com
29                netvigator.com
30                    level3.net
31                   virginm.net
32                        rr.com
41                 sbcglobal.net
49                      pldt.net
51                  1asiacom.net
56                     yesup.net
59                 btireland.net
60                     avast.com

How can I set all the other values in data[networkDomain] which aren't '.net' or '.com' to be 0?

2
  • '0', NULL, or do you mean that you want to delete those values? Commented Jul 20, 2019 at 14:46
  • Hi Luuk I meant '0'. Commented Jul 20, 2019 at 15:16

3 Answers 3

1

You can use DataFrame.apply, which will apply a function along an axis of the DataFrame.

>>> import re
>>> import pandas as pd
>>> regex = re.compile(r".com$|.net$")
>>>
>>> def my_func(row):
...     if regex.search(row):
...         return row
...     return 0  # default
...
>>> df = pd.DataFrame(
...     [
...         {"Domain": " amazonaws.com"},
...         {"Domain": " amazonaws2.com"},
...         {"Domain": " amazonaws.net"},
...         {"Domain": "(not set)"},
...     ]
... )
>>>
>>> df["Domain"] = df["Domain"].apply(my_func)
>>> print(df)
            Domain
0    amazonaws.com
1   amazonaws2.com
2    amazonaws.net
3                0
Sign up to request clarification or add additional context in comments.

Comments

1

Determine the row which doesn't satisfy the condition and modify the value of this row

import re
for i, j in enumerate(data.loc[:,'networkDomain']):
    if len(re.findall(r'\.com$|\.net$', j))==0:
        data.loc[i,'networkDomain'] = 0
print(data)

Comments

1

Use DataFrame.apply() to apply a function to every row in the series, note args argument must be passed as a tuple:

from pandas import DataFrame
import re

d={'col': [1,2,3], 'col2': ['a.net',2,3]}

df=DataFrame(columns=d.keys(), data=d)

def mask0(s, pattern):

    s =str(s)

if re.match(pattern, s):
    return s
else:
    return 0

pat = re.compile('.+[\.net|\.com]')
df['col2'] = df['col2'].apply(mask0, args=(pat,))

print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.