Add prefix to strings matching a specific pattern

Question

Based on a DataFrame that contains dates

import pandas as pd
df = pd.DataFrame({'month':['2','5','8'],'year':['2001',' 89','1999']})
print(df)
  month  year
0     2  2001
1     5    89
2     8  1999

I want to prefix all year instances consisting of only 2 digits by 19, such that the resulting DataFrame is

  month  year
0     2  2001
1     5  1989
2     8  1999

I tried

pattern = r'[^\d]*\d{2}[^\d]*'
replacement = lambda m: '19'+m
df.year = df.year.str.replace(pattern,replacement)
print(df)
    month  year
0     2   NaN
1     5   NaN
2     8   NaN

Which does not work. What is the problem?

df['year'] = df['year'].str.strip().apply(lambda x: '19' + x if len(x) == 2 else x)? — Rakesh
– Rakesh, Commented Jan 27, 2020 at 12:22
Yes, you may assume that all 2 digit instances need to be prefixed by 19. — Oblomov
– Oblomov, Commented Jan 27, 2020 at 12:23

tripleee · Accepted Answer · 2020-01-27 12:31:54Z

1

[^\d] requires there to be a character which is not a digit. But then you say this can be repeated zero times, which of course trivially is also true when there are more than two digits. You want to match ^\d{2}$ instead.

(Also, [^\d] is better written \D.)

A numeric comparison is probably much better than a regex here, though. Simply check if the number is smaller than 100.

edited Jan 27, 2020 at 12:31

answered Jan 27, 2020 at 12:23

tripleee

192k37 gold badges318 silver badges367 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Wiktor Stribiżew Over a year ago

No, [^\d]* does not require anything as * matches zero or more chars.

tripleee Over a year ago

Thanks for the feedback; rephrased.

Wiktor Stribiżew · Accepted Answer · 2020-01-27 12:30:02Z

1

The lambda m: '19'+m is wrong because m is a MatchData object, not a string. You might have tried m.group(), but since you also match any non-digit chars on both ends of a number (as whitespace) you might still get wrong results.

You may use

df['year'] = df['year'].str.strip().str.replace('^\d{2}$', r'19\g<0>')

NOTES:

You need to get rid of leading/trailing whitespace with str.strip()
You need to match all strings that consist of just 2 digits with ^\d{2}
The replacement is a concatenation of 19 and the match value (\g<0> is the whole match backreference).

edited Jan 27, 2020 at 12:30

answered Jan 27, 2020 at 12:24

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Comments

halfer · Accepted Answer · 2023-09-26 22:31:20Z

0

Count strings that has a length of two and prefix it with 19:

df.assign(year = np.where(df.year.str.strip().str.len()==2,
                          '19'+df.year.str.strip(),
                           df.year))


    month   year
0   2   2001
1   5   1989
2   8   1999

edited Sep 26, 2023 at 22:31

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered Jan 27, 2020 at 12:29

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

Add prefix to strings matching a specific pattern

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related