Python comparing two strings

Question

Is there a function to compare how many characters two strings (of the same length) differ by? I mean only substitutions. For example, AAA would differ from AAT by 1 character.

Levenshtein distance for example?

poke
– poke

2014-02-10 21:04:21 +00:00
Commented Feb 10, 2014 at 21:04 — poke
– poke, Commented Feb 10, 2014 at 21:04

score 4 · Accepted Answer · 2014-02-10 21:10:54Z

4

This will work:

>>> str1 = "AAA"
>>> str2 = "AAT"
>>> sum(1 for x,y in enumerate(str1) if str2[x] != y)
1
>>> str1 = "AAABBBCCC"
>>> str2 = "ABCABCABC"
>>> sum(1 for x,y in enumerate(str1) if str2[x] != y)
6
>>>

The above solution uses sum, enumerate, and a generator expression.

Because True can evaluate to 1, you could even do:

>>> str1 = "AAA"
>>> str2 = "AAT"
>>> sum(str2[x] != y for x,y in enumerate(str1))
1
>>>

But I personally prefer the first solution because it is clearer.

edited Feb 10, 2014 at 21:10

answered Feb 10, 2014 at 21:05

user2555451

Sign up to request clarification or add additional context in comments.

Comments

sjakobi · Accepted Answer · 2014-02-10 21:22:01Z

3

This is a nice use case for the zip function!

def count_substitutions(s1, s2):
    return sum(x != y for (x, y) in zip(s1, s2))

Usage:

>>> count_substitutions('AAA', 'AAT')
1

From the docs:

zip(...)
    zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

    Return a list of tuples, where each tuple contains the i-th element
    from each of the argument sequences.  The returned list is truncated
    in length to the length of the shortest argument sequence.

edited Feb 10, 2014 at 21:22

answered Feb 10, 2014 at 21:06

sjakobi

3,6681 gold badge28 silver badges44 bronze badges

3 Comments

mhlester Over a year ago

Might I suggest avoiding variable names starting with lowercase L? I thought you were comparing 11 to 1

sjakobi Over a year ago

@mhlester: You have a point there. But what are good abbreviations for several letters in a string?

mhlester Over a year ago

If it's only used in a comprehension, it doesn't much matter. I fully support your x/y :)

grromrell · Accepted Answer · 2014-02-10 21:11:03Z

1

Building on what poke said I would suggest the jellyfish package. It has several distance measures like what you are asking for. Example from the documentation:

IN [1]: jellyfish.damerau_levenshtein_distance('jellyfish', 'jellyfihs')
OUT[1]: 1

or using your example:

IN [2]: jellyfish.damerau_levenshtein_distance('AAA','AAT')
OUT[2]: 1

This will work for many different string lengths and should be able to handle most of what you throw at it.

answered Feb 10, 2014 at 21:11

grromrell

2861 silver badge9 bronze badges

Comments

Steve Jessop · Accepted Answer · 2014-02-10 22:23:52Z

1

Similar to simon's answer, but you don't have to zip things in order to just call a function on the resulting tuples because that's what map does anyway (and itertools.imap in Python 2). And there's a handy function for != in operator. Hence:

sum(map(operator.ne, s1, s2))

edited Feb 10, 2014 at 22:23

answered Feb 10, 2014 at 22:08

Steve Jessop

281k40 gold badges473 silver badges709 bronze badges

3 Comments

goodcow Over a year ago

Could the operator.ne be replaced with a lambda function?

Steve Jessop Over a year ago

@goodcow: sure, but I don't think it would improve things in any way other than reducing the number of import lines at the top of your file :-)

sjakobi Over a year ago

Nice! This version is quite a bit faster than my version on Python 3.3!

Collectives™ on Stack Overflow

Python comparing two strings

4 Answers 4

Comments

3 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

3 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related