0

I have a Java application which streams Twitter data.

Assuming that I have a String text = tweet.getText() variable.

In a text we can have one or more @MentionedUser. I'd like to delete not just the @ but the username too. How can I do this with replaceAll and without touching the rest of the string?

Thank you.

4
  • 2
    text = text.replaceAll("@\\w+", "");. Commented Dec 1, 2017 at 16:19
  • thank you. Could you explain me what do "\\w+" chars mean? Commented Dec 1, 2017 at 16:20
  • 1
    \w means word characters, + means a greedy quantifier matching 1 or more. Documentation. Commented Dec 1, 2017 at 16:22
  • Abiut title: What would be non-custom regexp? :-) Commented Dec 1, 2017 at 17:04

2 Answers 2

2

I would like to use (^|\s)@\w+($|\s) because you can get emails in your input like :

a @twitter username and a [email protected] another @twitterUserName

So you can use :

String text = "a @twitter username and a [email protected] another @twitterUserName";
text = text.replaceAll("(^|\\s)@\\w+($|\\s)", "$1$2");
// Output : a  username and a [email protected] another 

Details :

  1. (^|\s) which match ^ start of string or | a space \s
  2. @\w+ match @ followed by one or more word characters which is equivalent to [A-Za-z0-9_]
  3. ($|\s) which match $ end of string or | a space \s

If you want to go deeper to specify the correct syntax of twitter usernames i read this article here they mention some helpful information :

  • Your username cannot be longer than 15 characters. Your name can be longer (50 characters), but usernames are kept shorter for the sake of ease.

  • A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. ...

From this rules you use this regex as well :

(?i)(^|\s)@[a-z0-9_]{1,15}($|\s)
Sign up to request clarification or add additional context in comments.

Comments

1

Here is an alternative which does not produce doubled whitespaces and also does not capture emails:

String str = "a @twitter    @user     username and a [email protected] another @twitterUserName @test [email protected]";
System.out.println(str.replaceAll("(?<=[^\\w])@[^@\\s]+(\\s+|$)", ""));

Output:

a username and a [email protected] another [email protected]

Explanation of the parts of the actual regex expression (?<=[^\w])@[^@\s]+(\s+|$) :

  1. (?<=[^\w])@ - Try to find the '@' character and then look back to check that there is no regular character behind it (uses zero-width positive lookbehind).
  2. [^@\s]+ - Find something which is not an '@' or space character
  3. (\s+|$) - Find multiple spaces or the end of the line

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.