0

Using the stream API; once the relevant data has been filtered I'd like to edit the data being collected. Here is the code so far:

  String wordUp = word.substring(0,1).toUpperCase() + word.substring(1);
  String wordDown = word.toLowerCase();

  ArrayList<String> text = Files.lines(path)
        .parallel() // Perform filtering in parallel
        .filter(s -> s.contains(wordUp) || s.contains(wordDown) &&  Arrays.asList(s.split(" ")).contains(word))
        .sequential()
        .collect(Collectors.toCollection(ArrayList::new));

Edit The code below is awful and I am trying to avoid it.(It also does not entirely work. It was done at 4am, please excuse it.)

    for (int i = 0; i < text.size(); i++) {
        String set = "";
        List temp = Arrays.asList(text.get(i).split(" "));
        int wordPos = temp.indexOf(word);

        List<String> com1 = (wordPos >= limit) ? temp.subList(wordPos - limit, wordPos) : new ArrayList<String>();
        List<String> com2 = (wordPos + limit < text.get(i).length() -1) ? temp.subList(wordPos + 1, wordPos + limit) : new ArrayList<String>();
        for (String s: com1)
            set += s + " ";
        for (String s: com2)
            set += s + " ";
        text.set(i, set);
    }

It's looking for a particular word in a text file, once the line has been filtered in I'd like to only collect a portion of the line every time. A number of words on either side of the keyword that is being searched for.

eg:

keyword = "the" limit = 1

It would find: "Early in the morning a cow jumped over a fence."

It should then return: "in the morning"

*P.S. Any suggested speed improvements will be up-voted.

9
  • 1
    I don't see how you use this limit in your code... Commented Mar 9, 2015 at 12:35
  • 2
    To modify elements, use map method of the stream. Commented Mar 9, 2015 at 12:36
  • And also, what should happen if the keyword is the first in the sentence and limit is 1? Commented Mar 9, 2015 at 12:36
  • 1
    What's the difference between wordUp, wordDown and word? Commented Mar 9, 2015 at 12:37
  • 1
    There is no sense in calling .parallel() and .sequential() on the same stream. A stream is either parallel or sequential. Note that collect works flawlessly with parallel streams. Further, your condition x || y && z looks suspicious; mind the operator precedence. But it’s not clear what it is supposed to do anyway. Commented Mar 9, 2015 at 13:38

1 Answer 1

7

There are two different tasks you should think about. First, convert a file into a list of words:

List<String> words = Files.lines(path)
    .flatMap(Pattern.compile(" ")::splitAsStream)
    .collect(Collectors.toList());

This uses your initial idea of splitting at space characters. This might be sufficient for simple tasks, however, you should study the documentation of BreakIterator to understand the difference between this simple approach and a real, sophisticated word boundary splitting.

Second, if you have a list of words, your task is to find matches of your word and convert sequences of items around the match into a single match String by joining the words using a single space character as delimiter:

List<String> matches=IntStream.range(0, words.size())
    // find matches
    .filter(ix->words.get(ix).matches(word))
    // create subLists around the matches
    .mapToObj(ix->words.subList(Math.max(0, ix-1), Math.min(ix+2, words.size())))
    // reconvert lists into phrases (join with a single space
    .map(list->String.join(" ", list))
    // collect into a list of matches; here, you can use a different
    // terminal operation, like forEach(System.out::println), as well
    .collect(Collectors.toList());
Sign up to request clarification or add additional context in comments.

3 Comments

this answer is fantastic, it is elegant and exactly the sort of answer I am looking for thank you so much. I like that it avoids any issues that might arise when selecting words by line. I will have a look at the link you suggested. One more thing, do you maybe have a link or advice as to how I could find the time complexity of something like this?
I could be wrong but I assume the time complexity for both is O(n) since they go though all n elements in the stream. However, together would they be O(n) + O(n) = O(2n)?
@Warosaurus Technically, O(2n) = O(n) ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.