59

I need to perform Diffs between Java strings. I would like to be able to rebuild a string using the original string and diff versions. Has anyone done this in Java? What library do you use?

String a1; // This can be a long text
String a2; // ej. above text with spelling corrections
String a3; // ej. above text with spelling corrections and an additional sentence

Diff diff = new Diff();
String differences_a1_a2 = Diff.getDifferences(a,changed_a);
String differences_a2_a3 = Diff.getDifferences(a,changed_a);    
String[] diffs = new String[]{a,differences_a1_a2,differences_a2_a3};
String new_a3 = Diff.build(diffs);
a3.equals(new_a3); // this is true
1

9 Answers 9

58

This library seems to do the trick: google-diff-match-patch. It can create a patch string from differences and allow to reapply the patch.

edit: Another solution might be to https://code.google.com/p/java-diff-utils/

Sign up to request clarification or add additional context in comments.

5 Comments

Those are different libs, FWIW
Maven repository for the google-diff-match-patch is there.
The actively maintained fork of java-diff-utils seems to be github.com/bkromhout/java-diff-utils
google-diff-match-patch on github: github.com/GerHobbelt/google-diff-match-patch
The maintained fork seems to be now github.com/java-diff-utils/java-diff-utils
27

Apache Commons has String diff

org.apache.commons.lang.StringUtils

StringUtils.difference("foobar", "foo");

3 Comments

It returns the remainder of the second String, starting from where it's different from the first. Which is not efficient enough for me since i would be working with big texts. See: StringUtils.difference("ab", "abxyz") -> "xyz" StringUtils.difference("ab", "xyzab") -> "xyzab";
Also beware this gotcha: StringUtils.difference("abc", "") = "" StringUtils.difference("abc", "abc") = ""
I'm looking for the wdiff (unix command) in Java, see man page at docs.oracle.com/cd/E88353_01/html/E37839/wdiff-1.html, and SO answer here stackoverflow.com/a/17290563/3281336
4

The java diff utills library might be useful.

1 Comment

The repo github.com/bkromhout/java-diff-utils forked indirectly from the original GitHub repository and is better maintained. Maybe you can join forces there?
3

As Torsten Says you can use

org.apache.commons.lang.StringUtils;

System.err.println(StringUtils.getLevenshteinDistance("foobar", "bar"));

3 Comments

Thank you, but getLevenshteinDistance just returns an integer. That's not enough to rebuild the strings.
@hstoerr you are correct I must of missed this part in the original question. Long time ago now :)
That method is also deprecated.
1

If you need to deal with differences between big amounts of data and have the differences efficiently compressed, you could try a Java implementation of xdelta, which in turn implements RFC 3284 (VCDIFF) for binary diffs (should work with strings too).

Comments

1

I found it useful to discover, (for a regression test, where I didn't need diffing support in production) that assertj provides built-in access for java-diff-utils. See its DiffUtils, InputStream, or Diff classes, for example.

Comments

0

Use the Levenshtein distance and extract the edit logs from the matrix the algorithm builds up. The Wikipedia article links to a couple of implementations, I'm sure there's a Java implementation among in.

Levenshtein is a special case of the Longest Common Subsequence algorithm, you might also want to have a look at that.

Comments

0

Apache Commons Text now has StringsComparator:

StringsComparator c = new StringsComparator(s1, s2);
c.getScript().visit(new CommandVisitor<Character>() {

    @Override
    public void visitKeepCommand(Character object) {
        System.out.println("k: " + object);
    }

    @Override
    public void visitInsertCommand(Character object) {
        System.out.println("i: " + object);
    }

    @Override
    public void visitDeleteCommand(Character object) {
        System.out.println("d: " + object);
    }
});

Comments

-7
public class Stringdiff {
public static void main(String args[]){
System.out.println(strcheck("sum","sumsum"));
}
public static String strcheck(String str1,String str2){
    if(Math.abs((str1.length()-str2.length()))==-1){
        return "Invalid";
    }
    int num=diffcheck1(str1, str2);
    if(num==-1){
        return "Empty";
    }
    if(str1.length()>str2.length()){
        return str1.substring(num);
    }
    else{
        return str2.substring(num);
    }

}

public static int diffcheck1(String str1,String str2)
{
    int i;
    String str;
    String strn;
    if(str1.length()>str2.length()){
        str=str1;
        strn=str2;
    }
    else{
        str=str2;
        strn=str1;
    }
    for(i=0;i<str.length() && i<strn.length();i++){
            if(str1.charAt(i)!=str2.charAt(i)){
                return i;
            }
    }
        if(i<str1.length()||i<str2.length()){
            return i;
        }

    return -1;

   }
   }

1 Comment

Untested plain text code like this almost never makes sense. Create a project on a FLOSS code hosting page and provide the code + tests there.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.