3

How can I apply multiple conditions in pandas? For example I have this dataframe

Country   VAT        
RO       RO1449488
RO       RO1449489
RO       RO1449486
MD       2980409450027

For example I want for the Countries with RO, to delete the "RO" from VAT and remain just number. Or for example if the Country is not RO and let(VAT) is 13, to add "03" in front of VAT

Output to be like

Country   VAT        
RO       1449488
RO       1449489
RO       1449486
MD       032980409450027

I know how to do this with openpyxl, but pandas is new for me and I find pandas syntax to be harder to understand.

2 Answers 2

3

You can use boolean selection with numpy.select:

import numpy as np
                       # condition1            condition2
df['VAT'] = np.select([df['Country'].eq('RO'), df['VAT'].str.len().eq(13)],
                       # replacement1      
                      [df['VAT'].str.replace('^RO', '', regex=True),
                      '03'+df['VAT']], # replacement2
                       df['VAT']       # default
                     )

output:

  Country              VAT
0      RO          1449488
1      RO          1449489
2      RO          1449486
3      MD  032980409450027
Sign up to request clarification or add additional context in comments.

Comments

3

If you want to have something else than @mozway one-liner, you can use apply() to make any modification over your DataFrame. Defining your code in a function :

def myFunc(x):
    if x['VAT'].startswith("RO"):
        result = x['VAT'][2:]
    elif x['Country'] != 'RO' and len(x['VAT']) == 13:
        result = "03" + x['VAT']
    # add other conditions here

    return result

Then you can apply it to your DataFrame row by row with axis=1

df['VAT'] = df.apply(myFunc, axis=1)
# Output
  Country              VAT
0      RO          1449488
1      RO          1449489
2      RO          1449486
3      MD  032980409450027

4 Comments

apply is equivalent to a loop, it will be slow on large datasets
I know that apply is closer to a loop than your solution, but isn't this still better to use a function from pandas library than a plain custom loop iterating without using iterrow() or iteritems() ?
In terms of performance, likely not (in terms of clarity, yes)
Note that I did not downvote your answer, it is still perfectly valid if performance is not required

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.