3

I have a list or an array of string

String [] elements = {"cat", "dog", "fish"};

and a string

String str = "This is a caterpillar and that is a dogger.";

I want to remove all the items of the array/list from the string if any exists in the string. so that the function should return a string

str = "This is a erpillar and that is a ger." (cat and dog removed from the string)

I can do something like this

private String removeElementsFromString (String str, String [] elements) {
        if(Arrays.stream(elements).anyMatch(str::contains)){
            for(String item : elements){
                str = str.replace(item, "");
            }
        }
        return str;
    }

but what is the elegant way to change the for loop to something else.

4
  • "I have a list or an array of string" This is an array, not a list. Commented Oct 15, 2021 at 15:06
  • 1
    What would you want to do for something like "docatg" (dog with cat in the middle of it)? Do you want to first remove the cat, then remove the dog? Or do you want to just remove the cat? Commented Oct 15, 2021 at 15:14
  • 1
    Note cases such as elements = {"cat", "catcher",...} With a phrase "pass it to catcher", if "cat is removed first, then the phrase will be "pass it to her" Commented Oct 15, 2021 at 15:15
  • You could also have competing overlaps like ["app","ply"] With the phrase "apply", we could remove "app" and leave "ly", or remove "ply" and leave "ap". Commented Oct 15, 2021 at 15:25

6 Answers 6

6

One-liner solution

The following one-liner does the job:

str = str.replaceAll(Arrays.stream(elements).map(s -> "(?:" + s + ")").collect(Collectors.joining("|")), "");

Demo:

import java.util.Arrays;
import java.util.stream.Collectors;

public class Main {
    public static void main(String[] args) {
        String[] elements = { "cat", "dog", "fish" };
        String str = "This is a caterpillar and that is a dogger.";
        
        str = str.replaceAll(Arrays.stream(elements).map(s -> "(?:" + s + ")").collect(Collectors.joining("|")), "");

        System.out.println(str);
    }
}

Output:

This is a erpillar and that is a ger.

ONLINE DEMO

Explanation:

Arrays.stream(elements).map(s -> "(?:" + s + ")").collect(Collectors.joining("|")) results into the regex, (?:cat)|(?:dog)|(?:fish) which means cat or dog or fish.

The next step is to replace this resulting regex by "".

Sign up to request clarification or add additional context in comments.

1 Comment

Second time my comment is deleted without explanation. This solution does not work for any input string (e.g. ":)") Can you prove that this is not true?
3

Another Solution with StringBuilder :

because it is much faster and consumes less memory.

I think that using StringBuilder instead of String is more appropriate here:

import java.io.IOException;
import java.util.stream.Stream;

public class Bounder {

public static void main(String[] args) throws IOException {
    String[] elements = { "cat", "dog", "fish" };
    String str = "This is a catcatcatcatcatcatcaterpillar ancatcatcatcatd thcatcatcatat is a dogdogdogdogdogdogger.";
// Use StringBuilder here instead of String     
StringBuilder bf = new StringBuilder(str);
    str =null;

    System.out.println("Original String   =  " + bf.toString());
    Stream.of(elements).forEach(e -> {
        int index = bf.indexOf(e);
        while (index != -1) {
            index = bf.indexOf(e);
            if (index != -1) {
                bf.delete(index, index + e.length());
            }
        }
    });

    System.out.println("Result            =  " + bf.toString());
}
}

Output :

  Original String   =  This is a catcatcatcatcatcatcaterpillar ancatcatcatcatd thcatcatcatat is a dogdogdogdogdogdogger.

  Result            =  This is a erpillar and that is a ger.

1 Comment

Your solution still has cost O(n^2) if you really want an efficient solution use Aho-Corasick with cost O(n).
2
Arrays.stream(elements).reduce(str, (r, w) -> r.replace(w, ""))

with the expected output.

If you want to reduce the input string until it is no longer possible, it is best to iterate until there are no changes

String n = str, o = null;
do {
    n = stream(elements).reduce(o = n, (r, w) -> r.replace(w, ""));
} while(!n.equals(o));

System.out.println(n);

then, with input string

This is a caterpillar and that is a docatg.

you'll get

This is a erpillar and that is a .

If really want a fast algorithm use Aho-Corasick with cost O(n)

    StringBuilder sb = new StringBuilder();
    int begining = -1;
    for (Emit e : Trie.builder().addKeywords(elements).build().parseText(str)) {
        sb.append(str, begining + 1, e.getStart());
        begining = e.getEnd();
    }
    sb.append(str, begining + 1, str.length());

    System.out.println(sb.toString());

Aside solution performance comparison (with Oussama ZAGHDOUD's solution):

Equals = true       // check all output are equals
Time1 = 18,548822   // Oussama ZAGHDOUD's solution O(n^2)
Time2 = 0,134459    // Aho-Corasick O(n) without precompute Trie
Time3 = 0,065056    // Aho-Corasick O(n) precomputed Trie

full bench code

static String alg1(String[] elements, String str) {
    StringBuilder bf = new StringBuilder(str);
    str =null;
    Stream.of(elements).forEach(e -> {
        int index = bf.indexOf(e);
        while (index != -1) {
            index = bf.indexOf(e);
            if (index != -1) {
                bf.delete(index, index + e.length());
            }
        }
    });
    return bf.toString();
}

static String alg2(String[] elements, String str) {
    StringBuilder sb = new StringBuilder();
    int begining = -1;
    for (Emit e : Trie.builder().addKeywords(elements).build().parseText(str)) {
        sb.append(str, begining + 1, e.getStart());
        begining = e.getEnd();
    }
    sb.append(str, begining + 1, str.length());

    return sb.toString();
}

static String alg3(Trie trie, String str) {
    StringBuilder sb = new StringBuilder();
    int begining = -1;
    for (Emit e : trie.parseText(str)) {
        sb.append(str, begining + 1, e.getStart());
        begining = e.getEnd();
    }
    sb.append(str, begining + 1, str.length());

    return sb.toString();
}

public static void main(String... args) throws JsonProcessingException {

    final ThreadLocalRandom rnd = ThreadLocalRandom.current();

    // test, use random numbers as words
    String[] elements = range(0, 1_000).mapToObj(i -> "w" + rnd.nextInt()).toArray(String[]::new);

    // intercalate random elements word with other random word
    String str = range(0, 100_000)
            .mapToObj(i -> "z" + rnd.nextInt() + " " + elements[rnd.nextInt(elements.length)])
            .collect(joining(", "));

    Trie trie = Trie.builder().addKeywords(elements).build();

    long t0 = System.nanoTime();
    String s1 = alg1(elements, str);
    long t1 = System.nanoTime();
    String s2 = alg2(elements, str);
    long t2 = System.nanoTime();
    String s3 = alg3(trie, str);
    long t3 = System.nanoTime();

    System.out.printf("Equals = %s%nTime1 = %f%nTime2 = %f%nTime3 = %f%n",
            s1.equals(s2) && s2.equals(s3), (t1 - t0) * 1e-9, (t2 - t1) * 1e-9, (t3 - t2) * 1e-9);
}

1 Comment

@Beginner You think my comments are negative because they point out flaws in those solutions. Are the flaws in those solutions or in my comments? Think about whether you are looking for the truth or just being complacent at the cost of misleading you. (Why then have others deleted their previous comments?)
1

I would simply use:

private String removeElementsFromString(String str, String[] elements) {
    for (String item : elements) {
        str = str.replace(item, "");
    }
    return str;
}

I don't see any advantage of the first condition:

if(Arrays.stream(elements).anyMatch(str::contains)) {

Comments

0

The most concise way would be to use replaceAll, which accepts a regular expression as the first parameter:

String newStr = str.replaceAll(String.join("|", elements), "");

This only works because the things in elements have no special regex characters. If any of them did (or there was a chance they did), you'd have to quote them:

String pattern = Arrays.stream(elements).map(Pattern::quote).collect(Collectors.joining("|"));

Note, however, that this would operate in a single pass. So if you had a string like:

docatg

this approach would result in dog, whereas an approach which does input.replace("cat", "").replace("dog", "") would remove the dog as well.

2 Comments

nope, words are not regex
@josejuan I don't understand what you mean. Some words are valid regex. All Pattern-quoted strings are valid regex.
0

You can do it like this. Just use a simple loop.

for (String word : elements) {
            str = str.replace(word,"");
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.