2

I use this function to split the string:

std::vector<std::string> splitString(const std::string& stringToSplit, const std::string& regexPattern)
{
    std::vector<std::string> result;

    const std::regex rgx(regexPattern);
    std::sregex_token_iterator iter(stringToSplit.begin(), stringToSplit.end(), rgx, -1);

    for (std::sregex_token_iterator end; iter != end; ++iter)
    {
        result.push_back(iter->str());
    }

    return result;
}

Now, if I want to split a string line by line (say, I have read a file content into a single variable), I do this:

auto vec = splitString(fileContent, "\\n");

On Windows, I get this:

line 1 \r
line 2 \r

This happens because Windows line ending is determined with \r\n. I have tried to use $, but again without success. What is the right way to capture line endings in Windows, too?

5
  • Use [\\r\\n]+. auto vec = splitString(fileContent, "[\\r\\n]+"); Commented May 5, 2015 at 10:41
  • Would it work in Linux, OS X, iOS, Android? Commented May 5, 2015 at 10:42
  • Since the \r and \n are line separators, I think it will run on all OSes. Commented May 5, 2015 at 10:43
  • Please let me know if the regex works for you, then I could post it as answer. Commented May 5, 2015 at 11:01
  • 1
    @Narek: Yes it will work for these systems (all use LF as newline). But unfortunately it will not work on the ATARI 800 :) Commented May 5, 2015 at 11:09

1 Answer 1

3

On Linux/Unix, OS X, iOS, Android OSes, line separators are either \r, or \n, or a combination of them. So, the most efficient way to capture them all is placing into a character class and use the + quantifier.

Thus, [\\r\\n]+ should "do the trick":

auto vec = splitString(fileContent, "[\\r\\n]+");

EDIT:

As @FabioFracassi mentions, this will remove empty lines. If empty lines should be preserved in the output, you can use

auto vec = splitString(fileContent, "(?:\\r\\n|\\r|\\n)");

The alternative list is starting with the longest option, since regular expressions are processed from left to right (at least, by default).

Sign up to request clarification or add additional context in comments.

2 Comments

won't that have inconsistent behavior on (consecutive) empty lines?
That depends on whether empty lines must be present in the resulting array. It is true that in case empty lines should be preserved, the regex must be "(?:\\r\\n|\\r|\\n)".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.