1

Is there a function to compare how many characters two strings (of the same length) differ by? I mean only substitutions. For example, AAA would differ from AAT by 1 character.

1

4 Answers 4

4

This will work:

>>> str1 = "AAA"
>>> str2 = "AAT"
>>> sum(1 for x,y in enumerate(str1) if str2[x] != y)
1
>>> str1 = "AAABBBCCC"
>>> str2 = "ABCABCABC"
>>> sum(1 for x,y in enumerate(str1) if str2[x] != y)
6
>>>

The above solution uses sum, enumerate, and a generator expression.


Because True can evaluate to 1, you could even do:

>>> str1 = "AAA"
>>> str2 = "AAT"
>>> sum(str2[x] != y for x,y in enumerate(str1))
1
>>>

But I personally prefer the first solution because it is clearer.

Sign up to request clarification or add additional context in comments.

Comments

3

This is a nice use case for the zip function!

def count_substitutions(s1, s2):
    return sum(x != y for (x, y) in zip(s1, s2))

Usage:

>>> count_substitutions('AAA', 'AAT')
1

From the docs:

zip(...)
    zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

    Return a list of tuples, where each tuple contains the i-th element
    from each of the argument sequences.  The returned list is truncated
    in length to the length of the shortest argument sequence.

3 Comments

Might I suggest avoiding variable names starting with lowercase L? I thought you were comparing 11 to 1
@mhlester: You have a point there. But what are good abbreviations for several letters in a string?
If it's only used in a comprehension, it doesn't much matter. I fully support your x/y :)
1

Building on what poke said I would suggest the jellyfish package. It has several distance measures like what you are asking for. Example from the documentation:

IN [1]: jellyfish.damerau_levenshtein_distance('jellyfish', 'jellyfihs')
OUT[1]: 1

or using your example:

IN [2]: jellyfish.damerau_levenshtein_distance('AAA','AAT')
OUT[2]: 1

This will work for many different string lengths and should be able to handle most of what you throw at it.

Comments

1

Similar to simon's answer, but you don't have to zip things in order to just call a function on the resulting tuples because that's what map does anyway (and itertools.imap in Python 2). And there's a handy function for != in operator. Hence:

sum(map(operator.ne, s1, s2))

3 Comments

Could the operator.ne be replaced with a lambda function?
@goodcow: sure, but I don't think it would improve things in any way other than reducing the number of import lines at the top of your file :-)
Nice! This version is quite a bit faster than my version on Python 3.3!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.