I want to match and then later replace the string to the closest match. I am using the stringdist library. Below is my code
stringdistmatrix("2 ltr thums up", c("solar thyme 30g", "Thums Up 2 L"), method = "lv")
It gives the output as below:
[,1] [,2]
8 12
It means that "solar thyme 30g" is closer to "2 ltr thums up" but in reality "Thums Up 2 L" should be closer. Shall I change the levenshtein method to something else?
method = "jw") which I often find a better metric when trying to match strings, and is scaled so that a match = 1 and a total non-match = 0. However, there is simply no magic way to make any of these functions reliably link character strings by their underlying referents. If you have more data for each string you can try a record-linkage approach.