1

I am trying to get domain names from the url from a column into another column. Its working on a string like object, when I apply to dataframe it doesn't work. How to do I apply this to a data frame?

Tried:

from urllib.parse import urlparse
import pandas as pd
id1 = [1,2,3]
ls = ['https://google.com/tensoflow','https://math.com/some/website',np.NaN]
df = pd.DataFrame({'id':id1,'url':ls})
df
# urlparse(df['url']) # ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
# df['url'].map(urlparse) # AttributeError: 'float' object has no attribute 'decode'

working on string:

string = 'https://google.com/tensoflow'
parsed_uri = urlparse(string)
result = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
result

looking for a column:

col3
https://google.com/
https://math.com/
nan

Errror

2
  • Please post the exact full error messages you're getting. Commented May 1, 2019 at 19:51
  • @ForceBru just added the error Commented May 1, 2019 at 19:58

1 Answer 1

1

You can try something like this.

Here I have used pandas.Series.apply() to solve.

» Initialization and imports

>>> from urllib.parse import urlparse
>>> import pandas as pd
>>> id1 = [1,2,3]
>>> import numpy as np
>>> ls = ['https://google.com/tensoflow','https://math.com/some/website',np.NaN]
>>> ls
['https://google.com/tensoflow', 'https://math.com/some/website', nan]
>>> 

» Inspect the newly created DataFrame.

>>> df = pd.DataFrame({'id':id1,'url':ls})
>>> df
   id                            url
0   1   https://google.com/tensoflow
1   2  https://math.com/some/website
2   3                            NaN
>>> 
>>> df["url"]
0     https://google.com/tensoflow
1    https://math.com/some/website
2                              NaN
Name: url, dtype: object
>>>

» Applying a function using pandas.Series.apply(func) on url column..

>>> df["url"].apply(lambda url: "{uri.scheme}://{uri.netloc}/".format(uri=urlparse(url)) if not pd.isna(url) else np.nan)
0    https://google.com/
1      https://math.com/
2                    NaN
Name: url, dtype: object
>>> 
>>> df["url"].apply(lambda url: "{uri.scheme}://{uri.netloc}/".format(uri=urlparse(url)) if not pd.isna(url) else str(np.nan))
0    https://google.com/
1      https://math.com/
2                    nan
Name: url, dtype: object
>>> 
>>> 

» Store the above result in a variable (not mandatory, just to simply).

>>> s = df["url"].apply(lambda url: "{uri.scheme}://{uri.netloc}/".format(uri=urlparse(url)) if not pd.isna(url) else str(np.nan))
>>> s
0    https://google.com/
1      https://math.com/
2                    nan
Name: url, dtype: object
>>> 

» Finally

>>> df2 = pd.DataFrame({"col3": s})
>>> df2
                  col3
0  https://google.com/
1    https://math.com/
2                  nan
>>> 

» To make sure, what is s and what is df2, check types (again, not mandatory).

>>> type(s)
<class 'pandas.core.series.Series'>
>>> 
>>> 
>>> type(df2)
<class 'pandas.core.frame.DataFrame'>
>>> 

Reference links:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.