3

Is there a preferred method for doing a logical XOR in python?

For example, if I have two variables a and b, and I want to check that at least one exists but not both, I have two methods:

Method 1 (bitwise operator):

if bool(a) ^ bool(b):
    do x

Method 2 (boolean operators):

if (not a and b) or (a and not b):
    do x

Is there an inherent performance benefit to using either one? Method 2 seems more "pythonic" but Method 1 looks much cleaner to me. This related thread seems to indicate that it might depend on what variable types a and b are in the first place!

Any strong arguments either way?

10
  • In what way does the other thread not answer your question? Those two are not equivalent, as detailed in the linked thread. Commented Oct 17, 2016 at 22:25
  • "basic"? You mean "boolean", right? Commented Oct 17, 2016 at 22:28
  • 1
    @TemporalWolf: Python doesn't have an "xor" boolean operator and I need to simulate that behavior in a script I'm writing. I'm asking specifically about 'pythonic' style/performance for two distinct xor implementations. I'm aware that they are not equivalent. Commented Oct 17, 2016 at 22:29
  • 1
    @dizzyf I would say def xor(a, b): return (a and not b) or (b and not a) is the most pythonic way to do it, then call xor(a, b) on things, although that assumes a and b are boolean values. wrap them if needed Commented Oct 17, 2016 at 22:43
  • 1
    I do not think it is the good idea to mark it as duplicate, as user himself mentioned about the approaches. He is more interested in the which should be preferred and Why. Commented Oct 17, 2016 at 22:50

2 Answers 2

5

One of the alternative way to achieve it is using any() and all() like:

if any([a, b]) and not all([a, b]):
    print "Either a or b is having value"

But based on the performance, below are the results:

  1. Using any() and all(): 0.542 usec per loop

    moin@moin-pc:~$ python -m "timeit" "a='a';b='b';" "any([a, b]) and not all([a, b])"
    1000000 loops, best of 3: 0.542 usec per loop
    
  2. Using bool(a) ^ bool(b): 0.594 usec per loop

    moin@moin-pc:~$ python -m "timeit" "a='a';b='b';" "bool(a) ^ bool(b)"
    1000000 loops, best of 3: 0.594 usec per loop
    
  3. Using (not a and b) or (a and not b): 0.0988 usec per loop

    moin@moin-pc:~$ python -m "timeit" "a='a';b='b';" "(not a and b) or (a and not b)"
    10000000 loops, best of 3: 0.0988 usec per loop
    

Clearly, your (not a and b) or (a and not b) is more efficient. Approximately 6 times efficient then others.


Comparison between few more flavors of and and or:

  1. Using a and not b or b and not a (as pointed by TemporalWolf): 0.116 usec per loop

    moin@moin-pc:~$ python -m "timeit" "a='a';b='b';" "a and not b or b and not a"
    10000000 loops, best of 3: 0.116 usec per loop
    
  2. Using (a or b) and not (a and b): 0.0951 usec per loop

    moin@moin-pc:~$ python -m "timeit" "a='a';b='b';" "(a or b) and not (a and b)"
    10000000 loops, best of 3: 0.0951 usec per loop
    

Note: This performance is evaluated for the value of a and b as str, and is dependent on the implementation of __nonzero__ / __bool__ / __or__ functions as is mentioned by viraptor in comment.

Sign up to request clarification or add additional context in comments.

6 Comments

a and not b or b and not a is equivalent. and has a stronger binding than or.
It depends on the implementation of __nonzero__ / __bool__ / __or__. For a str it's trivial. For something that does remote call it's not. Op said nothing about the values used.
@anonymous I think you may also see the benefit of and/or short-circuiting: Running it as a list comprehension only halves the time: for str in ("(bool(a) ^ bool(b))", "any([a, b]) and not all([a, b])", "a and not b or b and not a"): print timeit.timeit("[%s for a in range(2) for b in range(2)]" % str)
@viraptor: Agree with you on that. It totally depends on the implementation of these functions. Added your comment with the answer
@TemporalWolf: Added the timeit stats for the expression you mentioned with the answer
|
1

You can make it more readable than reducing the problem to XOR. Depending on the context these may be better:

if sum((bool(a), bool(b))) == 1:  # this naturally extends to more values
if bool(a) != bool(b):

So I think the best way is to go with what matches the actual meaning behind the XOR. Do you want them to not have the same value? Only one of them set? Something else?

If you use ^ and I'm reading the code, I'm going to assume you actually wanted to use bitwise operator and that it matters for some reason.

Is there an inherent performance benefit to using either one?

It's one statement. Unless you know it's a performance issue, it doesn't matter. If it is in a hot loop and your profiler shows you do need to optimise it, then you're likely better off using Cython or some other method of speeding it up.

2 Comments

For speed, technically if (not a) is not (not b): would be slightly faster and equivalent to if bool(a) != bool(b):, at least on CPython, because not is syntax (adds a UNARY_NOT instruction), while bool adds a LOAD_GLOBAL and CALL_FUNCTION (both more expensive). Switching to is not means you follow the code path that only deals with identity equality, no rich comparison machinery.
Sure, but again: "Unless you know it's a performance issue, it doesn't matter." I know straight away what bool(a)!=bool(b) does. I have to spend time figuring out what (not a) is not (not b) does.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.