4

I want to get all digits in a std::string but without using a loop (myself; what the code I'm calling uses, I don't mind). An alternative view of the request is: remove all non-digits from the string, leaving only the digits. I know that I can find all digit in a string using code like this:

std::string get_digits(std::string input) {
    std::string::size_type next_digit(0u);
    for (std::string::size_type pos(0u);
         input.npos != (pos = input.find_first_of("0123456789"));
         ++pos) {
        input[next_digit++] = input[pos];
    }
    input.resize(next_digit);
    return input;
}

However, this function uses a loop. std::string doesn't provide a function find_all() or something! Ideally, the string is maniulated in-place (the code above moves it but it is easily changed to take a reference).

When there are multiple alternatives, I'll promise to post profiling results of how good the different approaches work on some lengthy text.

9
  • recursion might or might not work for this. Commented Dec 17, 2014 at 23:47
  • @Bot: I think recursion is OK but it should work with large strings, too. That is, creating a stackoverflow is out of question. Commented Dec 17, 2014 at 23:48
  • 1
    I believe you could use std::partition to move all the digits to the front of the string. If order matters, then there is std::stable_partition. Commented Dec 17, 2014 at 23:48
  • 3
    std::remove_if (followed by erase). Should be faster than std::partition, since it only cares about first part, and doesn't mind leaving random garbage in the second part. Preserves order (of elements not removed) for free. Commented Dec 17, 2014 at 23:52
  • 1
    @KerrekSB: I have tried to clarify the question (I mean find all digits as they they are in the original and leave them in the result string). I have also removed the std::move(): you are right, the arguments are considered local variables and are automatically moved from (I don't think it harms, though, as I think arguments are not eligible for copy elision). Commented Dec 18, 2014 at 0:15

9 Answers 9

7

Although I primarily had hoped for 5 quick answers (which wasn't achieved, sigh) the answers and comments led to some interesting approaches I hadn't thought of myself. My personal expectation had been that the answers effectively would result in:

  • If you want to be fast, use

    input.erase(std::remove_if(input.begin(), input.end(),
                               [](unsigned char c){ return !std::isdigit(c); }),
                input.end());
    
  • If you want to be concise, use

    text = std::regex_replace(text, std::regex(R"(\D)"), "");
    

Instead, there were a number of approaches I hadn't even considered:

  • Use a recursive function!

  • Use std::partition() which seems to require extra work (retain the characters which will be thrown out) and changes the order.

  • Use std::stable_partition() which seems to require even more work but doesn't change the order.

  • Use std::sort() and extract the substring with the relevant characters although I don't know how to make that one retain the original sequence of character. Just using a stable version doesn't quite to it.

Putting the different approaches together and using a number of variations on how to classify the characters, led to a total of 17 version of roughly the same operation (the code is on github). Most of the versions use std::remove_if() and std::string::erase() but differ in the classification of digits.

  1. remove_if() with [](char c){ return d.find(c) == d.npos; }).
  2. remove_if() with [](char c){ return std::find(d.begin(), d.end(), c) == d.end(); }
  3. remove_if() with [](char c){ return !std::binary_search(d.begin(), d.end()); }
  4. remove_if() with [](char c){ return '0' <= c && c <= '9'; }
  5. remove_if() with [](unsigned char c){ return !std::isdigit(c); } (the char is passed as unsigned char to avoid undefined behavior in case c is a char with a negative value)
  6. remove_if() with std::not1(std::ptr_fun(std::static_cast<int(*)(int)>(&std::isdigit))) (the cast is necessary to determine the correct overload: std::isdigit() happens to be overloaded).
  7. remove_if() with [&](char c){ return !hash.count(c); }
  8. remove_if() with [&](char c){ return filter[c]; } (the code initializing actually uses a loop)
  9. remove_if() with [&](char c){ return std::isidigit(c, locale); }
  10. remove_if() with [&](char c){ return ctype.is(std::ctype_base::digit, c); }
  11. str.erase(std::parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
  12. str.erase(std::stable_parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
  13. the "sort-approach" describe in one of the answers
  14. the copy_if() approach described in one of the answers
  15. the recursive approach describe in one of the answers
  16. text = std::regex_replace(text, std::regex(R"(\D)"), ""); (I didn't manage to get this to work on icc)
  17. like 16 but with an already built regular expression

I have run the benchmark on a MacOS notebook. Since results like this are reasonably easy to graph with Google Chars, here is a graph of the results (although with the versions using regexps removed as these would cause the graph to scale such that the interesting bit isn't really visible). The results of the benchmarks in form of a table:

    test                          clang   gcc     icc
 1  use_remove_if_str_find        22525   26846  24815
 2  use_remove_if_find            31787   23498  25379
 3  use_remove_if_binary_search   26709   27507  37016
 4  use_remove_if_compare          2375    2263   1847
 5  use_remove_if_ctype            1956    2209   2218
 6  use_remove_if_ctype_ptr_fun    1895    2304   2236
 7  use_remove_if_hash            79775   60554  81363
 8  use_remove_if_table            1967    2319   2769
 9  use_remove_if_locale_naive    17884   61096  21301
10  use_remove_if_locale           2801    5184   2776
11  use_partition                  1987    2260   2183
12  use_stable_partition           7134    4085  13094
13  use_sort                      59906  100581  67072
14  use_copy_if                    3615    2845   3654
15  use_recursive                  2524    2482   2560
16  regex_build                  758951  531641 
17  regex_prebuild               775450  519263
Sign up to request clarification or add additional context in comments.

2 Comments

Thanx posting results, very interesting regarding both different answers (algorithms) and compiler used :). I thought that my recursive approach will perform much worse after reading you plan to test it with long input.
@DietmarKühl Very interesting. Some unexpected results here! Might be worth plotting the bar chart on a log-scale as well, so that the huge numbers don't swamp the variation in the smaller ones.
6

One way would be to use std::copy_if (or std::remove_if):

std::string get_digits(std::string input) {
    std::string result;
    std::copy_if(
        input.begin(), 
        input.end(), 
        std::back_inserter(result), 
        [](char c) { return '0' <= c && c <= '9'; });
    return result;
}

Obviously this uses a loop internally, but you said you don't care about that...

Edit: With std::remove_if:

std::string get_digits_remove(std::string input) {
    auto itErase = std::remove_if(
        input.begin(), 
        input.end(), 
        [](char c) { return !('0' <= c && c <= '9'); });
    input.erase(itErase, input.end());
    return input;
}

2 Comments

You could use std::isdigit(c) from <cctype>
@TemplateRex True enough, and several of the other answers here do use isdigit, although it won't provide exactly the same behaviour as the original find_first_of.
2

I would start with a nice primitive function that composes the std algorithms you want to use:

template<class Container, class Test>
void erase_remove_if( Container&& c, Test&& test ) {
  using std::begin; using std::end;
  auto it = std::remove_if( begin(c), end(c), std::forward<Test>(test) );
  c.erase( it, end(c) );
}

then we write save digits:

std::string save_digits( std::string s ) {
  erase_remove_if( s,
    [](char c){
      if (c > '9') return true;
      return c < '0';
    }
  );
  return s;
}

Comments

2

You can do this in-place with std::partition:

std::string get_digits(std::string& input)
{
    auto split =
        std::partition( std::begin(input), std::end(input), [](char c){return ::isdigit(c);} );

    size_t len = std::distance( std::begin(input), split );
    input.resize( len );
    return input;
}

std::partition does not guarantee order, so if order matters, use std::stable_partition

Comments

1
// terrible no-loop solution
void getDigs(const char* inp, char* dig)
{
    if (!*inp)
        return;
    if (*inp>='0' && *inp<='9')
    {
        *dig=*inp; 
        dig++;
        *dig=0;
    }
    getDigs(inp+1,dig);
}

Comments

1

Maybe the simple answer suffices?

std::string only_the_digits(std::string s)
{
    s.erase(std::remove_if(s.begin(), s.end(),
                           [](char c) { return !::isdigit(c); }), s.end());
    return s;
}

The downside of this approach is that it unconditionally creates a copy of the input data. If there are lots of digits, then that's OK, since we're reusing that object. Alternatively, you can make this function just modify the string in-place (void strip_non_digits(std::string &).)

But if there are only few digits and you want to leave the input untouched, then you may prefer to create a new (small) output object and not copy the input. This can be done with a referential view of the input string, e.g. as provided by the Fundamentals TS, and using copy_if:

std::string only_the_digits(std::experimental::string_view sv)
{
    std::string result;
    std::copy_if(sv.begin(), sv.end(), std::back_inserter(::isdigit));
    return result;
}

6 Comments

I'd better not use this version with my name in the string lest I risk undefined behavior!
@DietmarKühl: If you encode it properly :-) Or substitute your own favourite notion of 'digit'.
The lambda can be replaced with std::not1(std::ptr_fun(::isdigit)), although that's arguably less readable.
@remyabel: Yeah, I had that in an initial version, but then I sobered up. Maybe std::experimental::not should come in here.
@remyabel: I guess, it doesn't really matter much as isdigit is synchronized anyway but the version using a lambda can easily be inline while the version using a function pointer seems to be harder to inline (and it provides an easy hook to avoid undefined behavior). ... and I thought that not is a keyword (well, an alternate token), not available to a library!
|
1

No loop solution in 4 steps (but with error checking, more than 4 statements):

1) sort the string, using a suitable sort (incrementing order) ... now all digits will be together, concatentated

2) use std::string.find_first_of() to find the index of the first digit (be sure to check for a digit found)

3) use std::string.find_last_of() to find the index of the last digit (be sure to check for a digit found)

4) use std::string::substr() and the 2 previous indexes to extract the digits

1 Comment

I'm just putting together the promised benchmark and noticed the issue while writing. However, find_first_of() and find_last_of() plus a strategic check if there are any digits should do the trick.
1

This is about as succinct as I can get it I think.

std::string get_digits(std::string input)
{
    input.erase(std::stable_partition(
                               std::begin(input),
                               std::end(input),
                               ::isdigit),
                std::end(input));

    return input;
}

Features:

  1. passes sink argument by value to take advantage of copy elision in c++11
  2. preserves order of digits.
  3. No user code - uses only peer-reviewed stl functions. Chance of bugs - zero.

This would be the stl-style iterator-based approach:

template<class InIter, class OutIter>
OutIter collect_digits(InIter first, InIter last, OutIter first_out)
{
    return std::copy_if(first, last, first_out, ::isdigit);
}

This has a number of advantages:

  1. input can be any iterable range of chars, not just strings
  2. can be chained by virtue of returning the output iterator
  3. allows destination container/iterator (including an ostream_iterator)
  4. with a little love, it could be made to handle unicode chars etc

fun example:

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>

template<class InIter, class OutIter>
OutIter collect_digits(InIter first, InIter last, OutIter first_out)
{
    return std::copy_if(first, last, first_out, ::isdigit);
}

using namespace std;

int main()
{
    char chunk1[] = "abc123bca";
    string chunk2 { "def456fed" };
    vector<char> chunk3 = { 'g', 'h', 'i', '7', '8', '9', 'i', 'h', 'g' };

    string result;
    auto pos = collect_digits(begin(chunk1), end(chunk1), back_inserter(result));
    pos = collect_digits(begin(chunk2), end(chunk2), pos);
    collect_digits(begin(chunk3), end(chunk3), pos);
    cout << "first collect: " << result << endl;

    cout << "second collect: ";
    collect_digits(begin(chunk3),
                   end(chunk3),
                   collect_digits(begin(chunk2),
                                  end(chunk2),
                                  collect_digits(begin(chunk1),
                                                 end(chunk1),
                                                 ostream_iterator<char>(cout))));
    cout << endl;

    return 0;
}

Comments

0

I use this one-liner macro as long as #include <regex> comes before it or you otherwise include that:

#define DIGITS_IN_STRING(a) std::regex_replace(a, std::regex(R"([\D])"), "")

1 Comment

If performance matters, I'd recommend against regexps for this task: see the measurements above. It seems regexps would take a few hundred times longer than the most efficient approach.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.