0

I am trying to parse urls from a dataframe to get the 'path'. My dataframe has 3 columns: ['url'], ['impressions'], ['clicks']. I want to replace all the urls by their Path. Here is my code:

import csv
from urllib.parse import urlparse

    fic_in = 'file.csv'

    df = pd.read_csv(fic_in)
    obj = urlparse(df['url'])
    df['url'] = obj.path
    print(df)

The csv file contains thousands of urls and 2 other columns of informations about the urls. For a technical reason, I can't parse the urls manipulating the csv, but I have to parse them in the dataframe. When I execute this code, I have the following error that I don't really understand:

File "/Users/adamn/Desktop/test_lambda.py", line 33, in <module>obj = urlparse(df['url'])
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 389, in urlparse
    url, scheme, _coerce_result = _coerce_args(url, scheme)
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 125, in _coerce_args
    return _decode_args(args) + (_encode_result,)
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 109, in _decode_args
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 109, in <genexpr>
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/generic.py", line 1442, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I do get there is an error so what am I doing that is not possible to do? And how can I resolve it or just use another way to get this done?

Thanks for helping.

4
  • have you tried regular expression filter? If that's not working for you! Commented May 25, 2021 at 16:24
  • Can you provide the whole stack trace around that error message? It might help troubleshoot this, if the existing answer doesn't already solve your problem. Commented May 25, 2021 at 20:20
  • @NischayNamdev Well no I haven't, I thought it would be easier with urllib because the library was made for it. Commented May 25, 2021 at 21:39
  • @joanis Yes of course, I will add the full error message in comment Commented May 25, 2021 at 21:42

1 Answer 1

1

urlparse only takes one string at a time, not a series.

try:

df["URL"] =df["URL"].astype(str).apply(lambda x: urlparse(x).path)
Sign up to request clarification or add additional context in comments.

3 Comments

I just tried this and I got a new error that I do not really understand either : File "/Users/adamndubois/Desktop/test_lambda.py", line 33, in <module> df['url'] =df['url'].apply(lambda x: urlparse(x).path) AttributeError: 'float' object has no attribute 'decode'
@AdamD97 I modied the code to enforce the column "URL" being a string, however if it was not already a string you should check the data in the column is correct
Yes, the type of my 'url' column was 'object'. It works that way, thank you very much !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.