4

I'm using the "fuzzy match" functionality of the Regex module.

How can I get the "fuzziness value" of a "match" which indicates how different the pattern is to the string, just like the "edit distance" in Levenshtein?

I thought I could get the value in the Match object, but it's not there. The official docs said nothing about it, neither.

e.g.:

regex.match('(?:foo){e}','for')

a.captures() tells me that the word "for" is matched, but I'd like to know the fuzziness value, which should be 1 in this case.

Is there any way to achieve that?

2
  • It's certainly not ideal, but if all else fails you could try repeated attempts with (?:foo){e<=i} where you loop over some integer i. The first time you get a match, your i is the Levenshtein distance. Commented Jun 10, 2013 at 12:53
  • Or if you are working with a limited number of errors you could use something like (foo)|((?:foo){e=1})|((?:foo){e=2}) and check which group matched, if first e = 0, second e = 1, etc. Commented Jun 10, 2013 at 13:10

2 Answers 2

2
>>> import difflib
>>> matcher = difflib.SequenceMatcher(None, 'foo', 'for')
>>> sum(size for start, end, size in matcher.get_matching_blocks())
2
>>> max(map(len, ('foo', 'for'))) - _
1
>>>
>>>
>>> matcher = difflib.SequenceMatcher(None, 'foo', 'food')
>>> sum(size for start, end, size in matcher.get_matching_blocks())
3
>>> max(map(len, ('foo', 'food'))) - _
1

http://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_matching_blocks http://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_opcodes

Sign up to request clarification or add additional context in comments.

3 Comments

Errrr... This workaround is great, and difflib is good stuff, too. But I have to use the "groupdict" thing in re/regex, so, I don't think difflib is suitable for that. Thanks all the same.
I replaced the regex module with the difflib like you said, and it works like a charm! So, I wonder is this "SequenceMatcher" thing comparable to the Levenshtein algrithm?
In source code, there is no mention about Levenshtein. (I didn't read throughly.)
0
a = regex.match('(?:foo){e}','for')
a.fuzzy_counts 

this returns a tuple (x,y,z) where:

x = number of substitutions

y = number of insertions and

z = number of deletions

But this is not always a reliable count, ie: the regex match fuzziness night not equate to the true Levinstein distance in some cases

Python regex module fuzzy match: substitution count not as expected

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.