Changing object to string in python

Question

I'm trying to use the re library to use regular expression to substitute text. However, even after I use astype(str), the column that I want to change is still being stored as an object.

import os
import re
import pandas as pd


#https://www.geeksforgeeks.org/update-column-value-of-csv-in-python/


# reading the csv file
df = pd.read_excel('Upload to dashboard - Untitled (1).xlsx', skiprows = 7)
df.drop(df.tail(7).index,inplace=True) # drop last n rows
print(df.head(3))

df['Publisher URL'] = df['Publisher URL'].fillna('')
df['Publisher URL'] = df['Publisher URL'].astype(str)
df['Publisher URL'] = df['Publisher URL'].str.replace('gerd#','')
print(df.dtypes)
df['Publisher URL2'] = df['Publisher URL'].str.replace('www.','')
trim = re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])
#https://docs.python.org/3/library/re.html#re.sub

print(df['trim'])
df.to_csv("C:/out.csv")

this is my output for print(df.dtypes)

[3 rows x 8 columns]
Unnamed: 0        object
Buying Agency     object
Advertiser        object
Publisher URL     object
Date              object
Buy Type          object
Ad Type           object
No. of Ads       float64

And my error is

  File ~\.spyder-py3\temp.py:30 in <module>
    trim = re.sub('(.*)(?:\bm\.)(.*)|(.*)','',df['Publisher URL'])

  File ~\Anaconda3\lib\re.py:210 in sub
    return _compile(pattern, flags).sub(repl, string, count)

TypeError: expected string or bytes-like object

How can I change my script so I can use the re.sub function?

The regex is correct, I had some help from this question:Regex Exclusions

You'll also need to deal with nulls. So do df['Publisher URL2'] = df['Publisher URL2'].fillna('') or somethin g like that. Followed by df['Publisher URL2'] = df['Publisher URL2'].astype(str). — chitown88
– chitown88, Commented Oct 12, 2022 at 10:36
You probably wanted df['trim'] = df['Publisher URL'].str.replace(r'(.*)(?:\bm\.)(.*)', r'\1\2', regex=True) — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 12, 2022 at 10:37
hi @WiktorStribiżew, I want to replace mobile version of websites, so I want to find and replace the "m." version of websites ``` m.gsmarena.com/sony_xperia_5_iv-price-11838.php``` should be ``` gsmarena.com/sony_xperia_5_iv-price-11838.php``` However, when I try and find an replace "m." this includes websites like this ``` redfm.ie``` becomes ``` redfie``` which changes the website name. — Aki
– Aki, Commented Oct 12, 2022 at 16:16
So, df['trim'] = df['Publisher URL'].str.replace(r'^(.*?)\bm\.', r'\1', regex=True) will work as you need. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 12, 2022 at 17:26
It seems you have been tricked by regex101 regex flavor setting, too. You are using a PCRE option, and the replacement pattern is $1, in Python, it is \1. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 12, 2022 at 17:33

Wiktor Stribiżew · Accepted Answer · 2022-10-12 17:32:23Z

1

You can use

df['trim'] = df['Publisher URL'].str.replace(r'^(.*?)\bm\.', r'\1', regex=True)

Or, a better idea is to use

df['trim'] = df['Publisher URL'].str.replace(r'^(.*?[/.])m\.', r'\1', regex=True)

See regex demo #1 and regex demo #2

Details:

^ - start of string
(.*?[/.]) - Group 1 (in Python, \1, not $1, refers to this group pattern match): any chars other than line break chars, as few as possible, and then a / or .
m\. - m. substring.

answered Oct 12, 2022 at 17:32

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Aki Over a year ago

that's worked, thank you so much for the indepth answer! I didn't know that regex101 had a separate python option as well

Collectives™ on Stack Overflow

Changing object to string in python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related