1

I have a string array like this:

    String tweetString = ExudeData.getInstance().filterStoppingsKeepDuplicates(tweets.text);
    // get array of words and split
    String[] wordArray = tweetString.split(" ");

After I split the array, I print the following:

System.out.println(Arrays.toString(wordArray));

And the output I get is:

[new, single, fallin, dropping, days, artwork, hueshq, production, iseedaviddrums, amp, bigearl7, mix, reallygoldsmith, https, , , t, co, dk5xl4cicm, https, , , t, co, rvqkum0dk7]

What I want is to remove all the instances of commas, https, and single letters like 't' (after using split method above). So I want to end up with this:

[new, single, fallin, dropping, days, artwork, hueshq, production, iseedaviddrums, amp, bigearl7, mix, reallygoldsmith, co, dk5xl4cicm, https, co, rvqkum0dk7]

I've tried doing replaceAll like this:

String sanitizedString = wordArray.replaceAll("\\s+", " ").replaceAll(",+", ",");

But that just gave me the same initial output with no changes. Any ideas?

6
  • Easier to answer if you provide the original input. Note that split itself takes a regex. You might want to start with that. Commented Mar 12, 2019 at 14:31
  • @Mena - yes I want to parse out the strings I mentioned above after I used the split method if you read my actual question Commented Mar 12, 2019 at 14:35
  • It doesn't seem like you have actual commas in your output array, only empty strings. Commented Mar 12, 2019 at 14:36
  • Also, wordArray is an array, so you can't possibly have used replaceAll on it - it wouldn't have compiled. Commented Mar 12, 2019 at 14:38
  • oh good point, thanks for pointing those out! @RealSkeptic Commented Mar 12, 2019 at 14:41

3 Answers 3

2

If you are using Java 8

String[] result = Arrays.stream(tweetString.split("\\s+"))
            .filter(s -> !s.isEmpty())
            .toArray(String[]::new);

What I want is to remove all the instances of commas, https, and single letters like 't'

In this case you can make multiple filters like @Andronicus do or with matches and some regex like so :

String[] result = Arrays.stream(tweetString.split("\\s+"))
            .filter(s -> !s.matches("https|.|\\s+"))
            .toArray(String[]::new);
Sign up to request clarification or add additional context in comments.

3 Comments

I'm using Java 11
@Hana what work in Java 8 work in Java 11 except depricated, so my code work in both of them
THANK YOU! This one worked for me :) the second result you posted was what I was looking for
1

You can do something like this:

String[] filtered = Arrays
    .stream(tweetString.split("[ ,]"))
    .filter(str -> str.length() > 1)
    .filter(str -> !str.equals("http"))

Comments

1

Based on my comment here is quick solution. (Enhance the regex with all your keywords)

 private static void replaceFromRegex(final String text ) {
    String result = text.replaceAll("https($|\\s)| (?<!\\S)[^ ](?!\\S)","");
      System.out.println(result);
  }

and then test

  public static void main(String []args) throws Exception{
      replaceFromRegex("new single fallin dropping, , https");
     }

Note: This is just sample and you will have to enhance regex to consider starting word (e.g string starting with https and then space, etc)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.