0

Based on a DataFrame that contains dates

import pandas as pd
df = pd.DataFrame({'month':['2','5','8'],'year':['2001',' 89','1999']})
print(df)
  month  year
0     2  2001
1     5    89
2     8  1999

I want to prefix all year instances consisting of only 2 digits by 19, such that the resulting DataFrame is

  month  year
0     2  2001
1     5  1989
2     8  1999

I tried

pattern = r'[^\d]*\d{2}[^\d]*'
replacement = lambda m: '19'+m
df.year = df.year.str.replace(pattern,replacement)
print(df)
    month  year
0     2   NaN
1     5   NaN
2     8   NaN

Which does not work. What is the problem?

3
  • df['year'] = df['year'].str.strip().apply(lambda x: '19' + x if len(x) == 2 else x)? Commented Jan 27, 2020 at 12:22
  • Are you sure it should be prefixed with 19 in all cases? Commented Jan 27, 2020 at 12:22
  • Yes, you may assume that all 2 digit instances need to be prefixed by 19. Commented Jan 27, 2020 at 12:23

3 Answers 3

1

[^\d] requires there to be a character which is not a digit. But then you say this can be repeated zero times, which of course trivially is also true when there are more than two digits. You want to match ^\d{2}$ instead.

(Also, [^\d] is better written \D.)

A numeric comparison is probably much better than a regex here, though. Simply check if the number is smaller than 100.

Sign up to request clarification or add additional context in comments.

2 Comments

No, [^\d]* does not require anything as * matches zero or more chars.
Thanks for the feedback; rephrased.
1

The lambda m: '19'+m is wrong because m is a MatchData object, not a string. You might have tried m.group(), but since you also match any non-digit chars on both ends of a number (as whitespace) you might still get wrong results.

You may use

df['year'] = df['year'].str.strip().str.replace('^\d{2}$', r'19\g<0>')

NOTES:

  • You need to get rid of leading/trailing whitespace with str.strip()
  • You need to match all strings that consist of just 2 digits with ^\d{2}
  • The replacement is a concatenation of 19 and the match value (\g<0> is the whole match backreference).

Comments

0

Count strings that has a length of two and prefix it with 19:

df.assign(year = np.where(df.year.str.strip().str.len()==2,
                          '19'+df.year.str.strip(),
                           df.year))


    month   year
0   2   2001
1   5   1989
2   8   1999

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.