Replace substring by substring in column of data frame

Question

I have a pandas data frame data with several columns. One of these columns is GEN. This column contains german cities as strings. Some of these cities are in a bad format, meaning that they have values like "Frankfurt a.Main". For every element in data['GEN'] I would like to replace every expression of the form "\.[A-ZÄÖÜ]" (i.e. dot followed by upper case letter) by the corresponding expression "\.\b[A-ZÄÖÜ]". For example

"Frankfurt a.Main" becomes "Frankfurt a. Main"
"Frankfurt a.d.Oder" becomes "Frankfurt a.d. Oder" and so on.

I am pretty sure that pandas.Series.str.contains and pandas.Series.str.replace are helpful here, but one of my problems is that I don't know how to put the replacement task in a form that can be used by the above functions.

Timeless · Accepted Answer · 2022-11-04 20:45:01Z

1

You can use pandas.Series.str.replace to capture the two groups that compose a german city name in your original data and then add a whitespace between them.

Try this :

data['GEN'] = data['GEN'].str.replace(r'(\w+\s.*\.)(\w*)', r'\1 \2', regex=True)

# Output :

0      Frankfurt a. Main
1    Frankfurt a.d. Oder

answered Nov 4, 2022 at 20:45

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

The fourth bird · Accepted Answer · 2022-11-04 20:54:00Z

1

You could assert a dot to the left using a positive lookbehind (?<=\.) and match one of [A-ZÄÖÜ]

In the replacement use a space followed by the full match using \g<0>

import pandas as pd

pattern = r"(?<=\.)[A-ZÄÖÜ]"
items = [
    "Frankfurt a.Main",
    "Frankfurt a.d.Oder"
]
data = pd.DataFrame(items, columns=["GEN"])
data['GEN'] = data['GEN'].str.replace(pattern, r' \g<0>')
print(data)

Output

                   GEN
0    Frankfurt a. Main
1  Frankfurt a.d. Oder

answered Nov 4, 2022 at 20:54

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Collectives™ on Stack Overflow

Replace substring by substring in column of data frame

2 Answers 2

# Output :

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

# Output :

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related