7

Splitting on whitespace, period, comma or double quotes, and not on single quotes:

str = %Q{this is the.string    to's split,real "ok" nice-like.}
str.split(/\s|\.|,|"/)
=> ["this", "is", "the", "string", "", "", "", "to's", "split", "real", "", "ok", "", "nice-like"]

How to eloquently remove empty strings?

How to eloquently remove strings that are shorter than MIN_LENGTH?

6 Answers 6

8

The idea of using split is not right in this case. You should be using scan.

str = %Q{this is the.string    to's split,real "ok" nice-like.}
str.scan(/[\w'-]+/)
# => ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

In order to match strings that are MIN_LENGTH or longer, do like this:

MIN_LENGTH = 3
str.scan(/[\w'-]{#{MIN_LENGTH},}/)
# => ["this", "the", "string", "to's", "split", "real", "nice-like"]

When to use split, when to use scan

  • When the delimiters are messy and making a regex match them is difficult, use scan.
  • When the substrings to extract are messy and making a regex match them is difficult, use split.
  • When you want to impose conditions on the form of the substrings to be extracted, you scan.
  • When you want to impose conditions on the form of the delimiters, use split.
Sign up to request clarification or add additional context in comments.

2 Comments

This is actually much better for what I was trying to do. split is not good because you have to figure out all the other possible delimiters, like !, --, ?, ~, :, etc.
Tobias answered the first question best: str.split /[\s\.,"]+/
8

I'm not entirely clear on the problem domain, but if you just want to avoid the empty strings, why not split on one or more occurrences of your separators?

str.split /[\s\.,"]+/

Comments

6

I would think a simple way to do that is as follows:

str.split(/\s|\.|,|"/).select{|s| s.length >= MIN_LENGTH}

1 Comment

Works and easy to understand.
2

Try the below:

str.split(/\s*[.,"\s]\s*/)

Comments

2

We can achieve the same in multiple ways,

 > str.split(/[\s\.,"]/) - [""]
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

 > str.split(/[\s\.,"]/).select{|sub_string| sub_string.present?}
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

 > str.scan /\w+'?\w+/
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice", "like"]

Comments

1
MIN_LENGTH = 2

new_strings = str.split(/\s|\.|,|"/).reject{ |s| s.length < MIN_LENGTH }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.