2

I want to change DateWork['Variable'] values as per multiple where conditions and update in DateWork['Date']

If Frequency=3 and len(Variable)=6 then replace M with "-0" and update in DateWork['Date'] If Frequency=3 and len(Variable)=7 then replace M with "-" and update in DateWork['Date']

DataFrame: DateWork

Frequency Variable      Date
3         1950M2        1950-02-01
3         1950M3        1950-03-01
2         1950-07-01    1950-07-01
3         1950M9        1950-09-01
2         1950-10-01    1950-10-01
3         1950M10       1950-10-01

My code:

DateWork.loc[DateWork['Date']] = np.where(((DateWork['Frequency'] == 3) & (DateWork['variable'].str.len() == 6)), 'M', '-0',  DateWork['Date'])
DateWork.loc[DateWork['Date']] = np.where(((DateWork['Frequency'] == 3) & (DateWork['variable'].str.len() == 7)), 'M', '-',  DateWork['Date'])
DateWork.loc[DateWork['Frequency'] == 3, 'Date'] = DateWork.loc[DateWork['Frequency'] == 3, 'variable'] + '-01'

This gives error:

TypeError: where() takes at most 3 arguments (4 given)

2 Answers 2

2

Your error is raised because you were passing one extra argument to np.where, you can check the documentation about this method, link is below. also once, this problem gets fixed, the way you wrote your code makes the last np.where call update and replace all the ones before, so they need to be "nested" in order to work properly.

I also provided a solution without np.where should you request it.

Solution with numpy.where:

# where frequenct == 3 and len(variable) == 6, we put variable and replace M with -0, if that's not
# the case, we search where frequency == 3 and len(variable) == 7 and put variable while replacing M with -
# else we just put Variable
DateWork['Date'] = np.where((DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 6), DateWork['Variable'].str.replace('M','-0'),
                       np.where((DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 7), DateWork['Variable'].str.replace('M','-'), DateWork['Variable']))

# we add first day date where frequency == 3
DateWork.loc[DateWork['Frequency'] == 3, 'Date'] = DateWork.loc[DateWork['Frequency'] == 3, 'Date'] + '-01'

Solution with pandas.dataframe.loc:

# where frenquency == 3 and len(variable) == 6, in date we put variable and replace M with -0
DateWork.loc[(DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 6),'Date'] = DateWork['Variable'].str.replace('M','-0')

# where frequency == 3 and len(variable) == 7, in date we put variable and replace M with -
DateWork.loc[(DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 7),'Date'] = DateWork['Variable'].str.replace('M','-')

# where frequency == 2, in date we simply put variable
DateWork.loc[DateWork['Frequency'] == 2,'Date'] = DateWork['Variable']

# where frequency == 3, in date we add first day date.
DateWork.loc[DateWork['Frequency'] == 3, 'Date'] = DateWork.loc[DateWork['Frequency'] == 3, 'Date'] + '-01'
Sign up to request clarification or add additional context in comments.

Comments

0

If nesting np.where is hard to read,

DateWork
Out[32]: 
   Frequency    Variable        Date
0          3      1950M2  1950-02-01
1          3      1950M3  1950-03-01
2          2  1950-07-01  1950-07-01
3          3      1950M9  1950-09-01
4          2  1950-10-01  1950-10-01
5          3     1950M10  1950-10-01

First if:

The else condition is the original Date column itself

DateWork['Date'] = np.where((DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 6), DateWork['Variable'].str.replace('M','-0'), DateWork['Date'])

DateWork
Out[34]: 
   Frequency    Variable        Date
0          3      1950M2     1950-02
1          3      1950M3     1950-03
2          2  1950-07-01  1950-07-01
3          3      1950M9     1950-09
4          2  1950-10-01  1950-10-01
5          3     1950M10  1950-10-01

Second If:

Here, the else condition is the output date column of the above step

DateWork['Date'] = np.where((DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 7), DateWork['Variable'].str.replace('M','-'), DateWork['Date'])

DateWork
Out[36]: 
   Frequency    Variable        Date
0          3      1950M2     1950-02
1          3      1950M3     1950-03
2          2  1950-07-01  1950-07-01
3          3      1950M9     1950-09
4          2  1950-10-01  1950-10-01
5          3     1950M10     1950-10

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.