1

I want to match and then later replace the string to the closest match. I am using the stringdist library. Below is my code

stringdistmatrix("2 ltr thums up", c("solar thyme 30g", "Thums Up 2 L"), method = "lv")

It gives the output as below:

[,1] [,2]
 8   12

It means that "solar thyme 30g" is closer to "2 ltr thums up" but in reality "Thums Up 2 L" should be closer. Shall I change the levenshtein method to something else?

1
  • 1
    you can try Jaro-Winkler distance (method = "jw") which I often find a better metric when trying to match strings, and is scaled so that a match = 1 and a total non-match = 0. However, there is simply no magic way to make any of these functions reliably link character strings by their underlying referents. If you have more data for each string you can try a record-linkage approach. Commented Feb 15, 2019 at 11:37

1 Answer 1

1

I tried the method = 'cosine' and the output looks fine.

Sign up to request clarification or add additional context in comments.

1 Comment

"Looks fine" how? More information on what you tried and what your criteria are will make this helpful to other users in the future

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.