1

I have output from an old terminal system that needs to be concatenated. Here is an example of the output.

output1 = 
LINE ONE
LINE TWO 
LINE THREE
NO DATA
LINE FOUR
LINE FIVE
LINE SIX         %

output2 = 
LINE FOUR        %
LINE FIVE 
LINE SIX 
LINE SEVEN
NO DATA
LINE EIGHT
END

there may be as many as 5 output strings that need to be joined. Problem ... There may be duplicates among the lines (there are some repeats (ex: NO DATA) that should not be removed), so a simple line comparison is not workable.

correct answer should be:
LINE ONE
LINE TWO 
LINE THREE
NO DATA
LINE FOUR
LINE FIVE
LINE SIX 
LINE SEVEN
NO DATA
LINE EIGHT
END

Looking for a java solution.

Any ideas? Thanks.

10
  • 1
    Do you receive the data as a stream, or in a single large extract/file? Commented Nov 26, 2011 at 5:51
  • 2
    Not sure exactly what you are asking. How do you have the strings? Why can't you just add all the strings to a list? Commented Nov 26, 2011 at 5:52
  • it comes in in separate streams, as described above. Commented Nov 26, 2011 at 5:54
  • The problem is that there is overlap, (duplicated data) in each of the strings. I don't know how to drop the duplicated data and join the strings to make a complete message. A line by line comparison will not work because there may be some legitimate duplicate lines. Commented Nov 26, 2011 at 5:56
  • So you want to remove some duplicates but not all? In that case I'd build a list of things you don't want to exclude if not a duplicate then when doing your line comparison I'd check to see if the line is one that you allow to be a duplicate. Commented Nov 26, 2011 at 6:01

2 Answers 2

1

Hints:

read each line of output

create an List

check

if (!"NO DATA".equals(thisLine) {
    if(!list.contains(thisLine)){ /* add to list */}
}else{
    /* add to list */
}
Sign up to request clarification or add additional context in comments.

4 Comments

problem with this... Duplicate lines (NO DATA) may be any string at all, not just "NO DATA". No way to predict what the duplicate lines will be.
Is there any thing you could identify about it ?
The only thing identifiable is that there are multiple lines that repeat. As I describe above, it is output to an old terminal system. When the data is too large for the screen, a scroll entry is made and the remaining data is presented. Several lines at the bottom of the first output are repeated in the next, depending on the length of the output.
You might want to skip the contains test and simply use docs.oracle.com/javase/7/docs/api/java/util/LinkedHashSet.html
1

If you receive the data as one large file, you should start by taking a look at the Scanner object, as in:

File f = new File("thefile.txt");
Scanner s = new Scanner(new BufferedReader(new FileReader(f))).useDelimiter("END");
while (s.hasNext()) {
    String block = s.next(); // string will now contain all text between instances of "END"
    // process the text block by splitting on \n
    String[] lines = block.split("\n");
    StringBuilder output = new StringBuilder();
    for ( String line : lines ) {
       // process each line, checking for duplicates, appending to output.
    }
    // write your final output to a file, etc.
}

2 Comments

won't this erase any legitimate identical lines as well as the duplicates? I think the answer lies in seeing the pattern across multiple lines, and not just each line.
@rob345 No this is a great way. Obviously you will need to redirect the scanner as necessary. The patterns in the multiple lines you talk about is Jason's comment // process each line,... there. You can put whatever protocol logic you need in there.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.