3

I have an std::vector of std::strings, each of which is a filename. Suppose filenames are of the format some_name_n.xyz.

The problem is that some_name_10.xyz is less than some_name_2.xyz. The files are produced by some other process.

What is the least painful way to sort them so that the number after '_' is considered for comparison, and not just its length?

11
  • You can simply rename them to be like '_%04d'. Commented Mar 4, 2014 at 17:59
  • Write your own comparator functor and pass it to sort? Commented Mar 4, 2014 at 18:00
  • And why without using default algorithm? What default algorithm? Commented Mar 4, 2014 at 18:01
  • 1
    Check out the following answers for references. Commented Mar 4, 2014 at 18:10
  • 1
    @Manu343726: that's not a duplicate at all... it requires a normal "<" comparison for a specific field in the structures being sorted, whereas this one needs special handling of embedded numbers. Commented Mar 4, 2014 at 19:45

4 Answers 4

1

std::sort allows you to specify a binary function for comparing two elements: http://www.cplusplus.com/reference/algorithm/sort/

Now it's just a matter of constructing that binary function. A partial example is here: Sorting std::strings with numbers in them?

Sign up to request clarification or add additional context in comments.

Comments

1

The least painful way is to put approporiate leading zeroes into your file names (even writing a second script that takes the generated names and renames them may be easier than writing your own sort routine).

The second least painful way is to write your own sort predicate that does sorts _ delimited numbers as a number rather than lexicographically.

Comments

1

Here's a comparison that handles any number of numeric values embedded in the strings:

#include <cstdlib>
#include <cctype>
#include <iostream>

#ifdef  _MSC_VER
#define strtoll _strtoi64
#endif

int cmp(const char* lhs, const char* rhs)
{
    while (*lhs || *rhs)
    {
        if (isdigit(*lhs) && isdigit(*rhs))
        {
            char* l_end;
            char* r_end;
            long long l = strtoll(lhs, &l_end, 10);
            long long r = strtoll(rhs, &r_end, 10);
            if (l < r) return -1;
            if (l > r) return 1;
            lhs = l_end;
            rhs = r_end;
        }
        else
            if (*lhs != *rhs)
                return *lhs - *rhs;
            else
                ++lhs, ++rhs;
    }
    return *lhs - *rhs;
}

It's deliberately "C style" so it can be applied directly and efficiently to character arrays. It returns a negative number if lhs < rhs, 0 if they're equal, and a positive number if lhs > rhs.

You can call this from a comparison functor or lambda specified to std::sort.

Comments

1

You can have a custom comparator something like following :

struct Comp{

    auto get_num (const std::string& a)
    {
        auto it1 = std::find_if( a.begin(), a.end(), ::isdigit );
        auto it2 = std::find_if( a.begin(), a.end(), 
                               [](char x){ return x == '.' ;}) ;
        /* Do some checks here for std::string::npos*/
        auto pos1 = std::distance( a.begin(), it1) ;
        auto pos2 = std::distance( it1, it2) ;
        return std::stoi (a.substr( pos1, pos2 )) ;
    }

    bool operator () (const std::string& a, const std::string& b)
    {
        return get_num (a) < get_num (b) ;
    }

};

See demo here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.