1

I have to clean a string passed in parameter, and remove all lowercase letters, and all special character except :

  • +
  • |
  • ^
  • space
  • =>
  • <=>

so i have this string passed in parameter:

aA azee + B => C=

and i need to clean this string to have this result:

A + B => C

I do

string.gsub(/[^[:upper:][+|^ ]]/, "")

output: "A + B C"

I don't know how to select the => (and for <=>) string's with regex in ruby)

I know that if i add string.gsub(/[^[:upper:][+|^ =>]]/, "") into my regex, the last = in my string passed in parameter will be selected too

2
  • 1
    (<?=>)|[^[:upper:]+|^ ] replace with $1? Commented Apr 9, 2018 at 15:46
  • Why does your string contain those extra characters? Commented Apr 9, 2018 at 15:56

3 Answers 3

5

You can try an alternative approach: matching everything you want to keep then joining the result.

You can use this regex to match everything you want to keep:

[A-Z\d+| ^]|<?=>

As you can see this is just a using | and [] to create a list of strings that you want to keep: uppercase, numbers, +, |, space, ^, => and <=>.

Example:

"aA azee + B => C=".scan(/[A-Z\d+| ^]|<?=>/).join()

Output:

"A  + B => C"

Note that there are 2 consecutive spaces between "A" and "+". If you don't want that you can call String#squeeze.

Sign up to request clarification or add additional context in comments.

5 Comments

join does not require a default argument to be explicitly passed and squeezing would probably make sense afterwards.
this is nevertheless the best approach AFAICT.
Thank you, i think that it's the best approach to solve my problem !
I would think the regex should also include \d.
/[A-Z\d+| ^]|<?=>/ is faster and also includes the ^ character that you forgot :)
1

See regex in use here

(<?=>)|[^[:upper:]+|^ ]
  • (<?=>) Captures <=> or => into capture group 1
  • [^[:upper:]+|^ ] Matches any character that is not an uppercase letter (same as [A-Z]) or +, |, ^ or a space

See code in use here

p "aA azee + B => C=".gsub(/(<?=>)|[^[:upper:]+|^ ]/, '\1')

Result: A + B => C

1 Comment

I prefer this solution because it explicitly excludes characters, as opposed to including what is inferred to be the strings to be kept. Also, the POSIX expression for uppercase letters has wider applicability than A-Z.
0
r = /[a-z\s[:punct:]&&[^+ |^]]/

"The cat, 'Boots', had 9+5=4 ^lIVEs^ leF|t.".gsub(r,'')
  #=> "T  B  9+54 ^IVE^ F|"

The regular expression reads, "Match lowercase letters, whitespace and punctuation that are not the characters '+', ' ', '|' and '^'. && within a character class is the set intersection operator. Here it intersects the set of characters that match a-z\s[:punct:] with those that match [^+ |^]. (Note that this includes whitespaces other than spaces.) For more information search for "character classes also support the && operator" in Regexp.

I have not included '=>' and '<=>' as those, unlike '+', ' ', '|' and '^', are multi-character strings and therefore require a different approach than simply removing certain characters.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.