How to implement a natural sort algorithm in c++?

Question

I'm sorting strings that are comprised of text and numbers. I want the sort to sort the number parts as numbers, not alphanumeric.

For example I want: abc1def, ..., abc9def, abc10def

instead of: abc10def, abc1def, ..., abc9def

Does anyone know an algorithm for this (in particular in c++)

Thanks

@dmckee - to be fair he didn't use the term (as I didn't when I asked the same question) "Natural Sorting" - that was edited in later. — Paul Tomblin
– Paul Tomblin, Commented Mar 13, 2009 at 18:35

Paul Tomblin · Accepted Answer · 2023-06-01 12:05:14Z

18

I asked this exact question (although in Java) and got pointed to ~~http://www.davekoelle.com/alphanum.html~~ which has an algorithm and implementations of it in many languages.

Update 14 years later: Dave Koelle’s blog has gone off line and I can’t find his actual algorithm, but here’s an implementation. https://github.com/cblanc/koelle-sort

Update 14 years and 5 months after the original answer: In the comments, it was pointed out that Dave Koelle’s blog is on the wayback machine at https://web.archive.org/web/20210207124255/davekoelle.com/alphanum.html

edited Jun 1, 2023 at 12:05

answered Mar 13, 2009 at 11:31

Paul Tomblin

184k59 gold badges324 silver badges412 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Dominic Rodger Over a year ago

+1 Thanks Paul - I looked for natural sort and the C++ tag, but didn't find anything.

김선달 Over a year ago

The link is dead

Hedede Over a year ago

It is available through the Wayback Machine: web.archive.org/web/20210207124255/davekoelle.com/alphanum.html

Josh Kelley · Accepted Answer · 2014-11-05 19:50:13Z

9

Several natural sort implementations for C++ are available. A brief review:

natural_sort<> - based on Boost.Regex.
- In my tests, it's roughly 20 times slower than other options.
Dirk Jagdmann's alnum.hpp, based on Dave Koelle's alphanum algorithm
- Potential integer overlow issues for values over MAXINT
Martin Pool's natsort - written in C, but trivially usable from C++.
- The only C/C++ implementation I've seen to offer a case insensitive version, which would seem to be a high priority for a "natural" sort.
- Like the other implementations, it doesn't actually parse decimal points, but it does special case leading zeroes (anything with a leading 0 is assumed to be a fraction), which is a little weird but potentially useful.
- PHP uses this algorithm.

edited Nov 5, 2014 at 19:50

answered Nov 5, 2014 at 19:43

Josh Kelley

58.7k22 gold badges166 silver badges259 bronze badges

Comments

Cœur · Accepted Answer · 2018-10-21 03:29:05Z

6

This is known as natural sorting. There's an algorithm here that looks promising.

Be careful of problems with non-ASCII characters (see Jeff's blog entry on the subject).

edited Oct 21, 2018 at 3:29

Cœur

39k25 gold badges207 silver badges282 bronze badges

answered Mar 13, 2009 at 11:18

Dominic Rodger

100k37 gold badges204 silver badges219 bronze badges

2 Comments

Will Over a year ago

Thats sweet but I don't have acces to boost :-|

Dominic Rodger Over a year ago

Then it looks like Paul Tomblin's answer may be more helpful to you - the C++ variant doesn't seem to use anything funky.

Community · Accepted Answer · 2017-05-23 12:18:33Z

2

Partially reposting my another answer:

bool compareNat(const std::string& a, const std::string& b){
    if (a.empty())
        return true;
    if (b.empty())
        return false;
    if (std::isdigit(a[0]) && !std::isdigit(b[0]))
        return true;
    if (!std::isdigit(a[0]) && std::isdigit(b[0]))
        return false;
    if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
    {
        if (a[0] == b[0])
            return compareNat(a.substr(1), b.substr(1));
        return (toUpper(a) < toUpper(b));
        //toUpper() is a function to convert a std::string to uppercase.
    }

    // Both strings begin with digit --> parse both numbers
    std::istringstream issa(a);
    std::istringstream issb(b);
    int ia, ib;
    issa >> ia;
    issb >> ib;
    if (ia != ib)
        return ia < ib;

    // Numbers are the same --> remove numbers and recurse
    std::string anew, bnew;
    std::getline(issa, anew);
    std::getline(issb, bnew);
    return (compareNat(anew, bnew));
}

toUpper() function:

std::string toUpper(std::string s){
    for(int i=0;i<(int)s.length();i++){s[i]=toupper(s[i]);}
    return s;
    }

Usage:

std::vector<std::string> str;
str.push_back("abc1def");
str.push_back("abc10def");
...
std::sort(str.begin(), str.end(), compareNat);

edited May 23, 2017 at 12:18

CommunityBot

11 silver badge

answered Nov 15, 2015 at 20:36

Jahid

22.6k10 gold badges97 silver badges114 bronze badges

1 Comment

Jahid Over a year ago

This is not very efficient, a more efficient and comprehensive solution is this one

Jan-Marten Spit · Accepted Answer · 2015-12-29 16:04:52Z

0

To solve what is essentially a parsing problem a state machine (aka finite state automaton) is the way to go. Dissatisfied with the above solutions i wrote a simple one-pass early bail-out algorithm that beats C/C++ variants suggested above in terms of performance, does not suffer from numerical datatype overflow errors, and is easy to modify to add case insensitivity if required.

sources can be found here

answered Dec 29, 2015 at 16:04

Jan-Marten Spit

11

3 Comments

Danh Over a year ago

Please post your code here instead of ask them to go to your personal website.

Jan-Marten Spit Over a year ago

my personal website is where it is maintained, and where it will be. glad i could be of help to others.

Jean-Michaël Celerier Over a year ago

@Jan-MartenSpit your website is dead :(

joesdiner · Accepted Answer · 2020-08-05 15:48:10Z

0

For those that arrive here and are already using Qt in their project, you can use the QCollator class. See this question for details.

answered Aug 5, 2020 at 15:48

joesdiner

1,1359 silver badges11 bronze badges

Comments

padina · Accepted Answer · 2020-08-08 06:43:33Z

Avalanchesort is a recursive variation of naturall sort, whiche merge runs, while exploring the stack of sorting-datas. The algorithim will sort stable, even if you add datas to your sorting-heap, while the algorithm is running/sorting.

The search-principle is simple. Only merge runs with the same rank.

After finding the first two naturell runs (rank 0), avalanchesort merge them to a run with rank 1. Then it call avalanchesort, to generate a second run with rank 1 and merge the two runs to a run with rank 2. Then it call the avalancheSort to generate a run with rank 2 on the unsorted datas....

My Implementation porthd/avalanchesort divide the sorting from the handling of the data using interface injection. You can use the algorithmn for datastructures like array, associative arrays or lists.

    /**
 * @param DataListAvalancheSortInterface $dataList
 * @param DataRangeInterface $beginRange
 * @param int $avalancheIndex
 * @return bool
 */
public function startAvalancheSort(DataListAvalancheSortInterface $dataList)
{
    $avalancheIndex = 0;
    $rangeResult = $this->avalancheSort($dataList, $dataList->getFirstIdent(), $avalancheIndex);
    if (!$dataList->isLastIdent($rangeResult->getStop())) {
        do {
            $avalancheIndex++;
            $lastIdent = $rangeResult->getStop();
            if ($dataList->isLastIdent($lastIdent)) {
                $rangeResult = new $this->rangeClass();
                $rangeResult->setStart($dataList->getFirstIdent());
                $rangeResult->setStop($dataList->getLastIdent());
                break;
            }
            $nextIdent = $dataList->getNextIdent($lastIdent);
            $rangeFollow = $this->avalancheSort($dataList, $nextIdent, $avalancheIndex);
            $rangeResult = $this->mergeAvalanche($dataList, $rangeResult, $rangeFollow);
        } while (true);
    }
    return $rangeResult;
}

/**
 * @param DataListAvalancheSortInterface $dataList
 * @param DataRangeInterface $range
 * @return DataRangeInterface
 */
protected function findRun(DataListAvalancheSortInterface $dataList,
                           $startIdent)
{
    $result = new $this->rangeClass();
    $result->setStart($startIdent);
    $result->setStop($startIdent);
    do {
        if ($dataList->isLastIdent($result->getStop())) {
            break;
        }
        $nextIdent = $dataList->getNextIdent($result->getStop());
        if ($dataList->oddLowerEqualThanEven(
            $dataList->getDataItem($result->getStop()),
            $dataList->getDataItem($nextIdent)
        )) {
            $result->setStop($nextIdent);
        } else {
            break;
        }
    } while (true);
    return $result;
}

/**
 * @param DataListAvalancheSortInterface $dataList
 * @param $beginIdent
 * @param int $avalancheIndex
 * @return DataRangeInterface|mixed
 */
protected function avalancheSort(DataListAvalancheSortInterface $dataList,
                                 $beginIdent,
                                 int $avalancheIndex = 0)
{
    if ($avalancheIndex === 0) {
        $rangeFirst = $this->findRun($dataList, $beginIdent);
        if ($dataList->isLastIdent($rangeFirst->getStop())) {
            // it is the last run
            $rangeResult = $rangeFirst;
        } else {
            $nextIdent = $dataList->getNextIdent($rangeFirst->getStop());
            $rangeSecond = $this->findRun($dataList, $nextIdent);
            $rangeResult = $this->mergeAvalanche($dataList, $rangeFirst, $rangeSecond);
        }
    } else {
        $rangeFirst = $this->avalancheSort($dataList,
            $beginIdent,
            ($avalancheIndex - 1)
        );
        if ($dataList->isLastIdent($rangeFirst->getStop())) {
            $rangeResult = $rangeFirst;
        } else {
            $nextIdent = $dataList->getNextIdent($rangeFirst->getStop());
            $rangeSecond = $this->avalancheSort($dataList,
                $nextIdent,
                ($avalancheIndex - 1)
            );
            $rangeResult = $this->mergeAvalanche($dataList, $rangeFirst, $rangeSecond);
        }
    }
    return $rangeResult;
}

protected function mergeAvalanche(DataListAvalancheSortInterface $dataList, $oddListRange, $evenListRange)
{
    $resultRange = new $this->rangeClass();
    $oddNextIdent = $oddListRange->getStart();
    $oddStopIdent = $oddListRange->getStop();
    $evenNextIdent = $evenListRange->getStart();
    $evenStopIdent = $evenListRange->getStop();
    $dataList->initNewListPart($oddListRange, $evenListRange);
    do {
        if ($dataList->oddLowerEqualThanEven(
            $dataList->getDataItem($oddNextIdent),
            $dataList->getDataItem($evenNextIdent)
        )) {
            $dataList->addListPart($oddNextIdent);
            if ($oddNextIdent === $oddStopIdent) {
                $restTail = $evenNextIdent;
                $stopTail = $evenStopIdent;
                break;
            }
            $oddNextIdent = $dataList->getNextIdent($oddNextIdent);
        } else {
            $dataList->addListPart($evenNextIdent);
            if ($evenNextIdent === $evenStopIdent) {
                $restTail = $oddNextIdent;
                $stopTail = $oddStopIdent;
                break;
            }
            $evenNextIdent = $dataList->getNextIdent($evenNextIdent);

        }
    } while (true);
    while ($stopTail !== $restTail) {
        $dataList->addListPart($restTail);
        $restTail = $dataList->getNextIdent($restTail);
    }
    $dataList->addListPart($restTail);
    $dataList->cascadeDataListChange($resultRange);
    return $resultRange;

}

}

Shouheng Wang · Accepted Answer · 2021-07-01 02:43:56Z

My algorithm with test code of java version. If you want to use it in your project you can define a comparator yourself.

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.function.Consumer;

public class FileNameSortTest {

    private static List<String> names = Arrays.asList(
            "A__01__02",
            "A__2__02",
            "A__1__23",
            "A__11__23",
            "A__3++++",
            "B__1__02",
            "B__22_13",
            "1_22_2222",
            "12_222_222",
            "2222222222",
            "1.sadasdsadsa",
            "11.asdasdasdasdasd",
            "2.sadsadasdsad",
            "22.sadasdasdsadsa",
            "3.asdasdsadsadsa",
            "adsadsadsasd1",
            "adsadsadsasd10",
            "adsadsadsasd3",
            "adsadsadsasd02"
    );

    public static void main(String...args) {
        List<File> files = new ArrayList<>();
        names.forEach(s -> {
            File f = new File(s);
            try {
                if (!f.exists()) {
                    f.createNewFile();
                }
                files.add(f);
            } catch (IOException e) {
                e.printStackTrace();
            }
        });
        files.sort(Comparator.comparing(File::getName));
        files.forEach(f -> System.out.print(f.getName() + " "));
        System.out.println();

        files.sort(new Comparator<File>() {

            boolean caseSensitive = false;
            int SPAN_OF_CASES = 'a' - 'A';

            @Override
            public int compare(File left, File right) {
                char[] csLeft = left.getName().toCharArray(), csRight = right.getName().toCharArray();
                boolean isNumberRegion = false;
                int diff=0, i=0, j=0, lenLeft=csLeft.length, lenRight=csRight.length;
                char cLeft = 0, cRight = 0;
                for (; i<lenLeft && j<lenRight; i++, j++) {
                    cLeft = getCharByCaseSensitive(csLeft[i]);
                    cRight = getCharByCaseSensitive(csRight[j]);
                    boolean isNumericLeft = isNumeric(cLeft), isNumericRight = isNumeric(cRight);
                    if (isNumericLeft && isNumericRight) {
                        // Number start!
                        if (!isNumberRegion) {
                            isNumberRegion = true;
                            // Remove prefix '0'
                            while (i < lenLeft && cLeft == '0') i++;
                            while (j < lenRight && cRight == '0') j++;
                            if (i == lenLeft || j == lenRight) break;
                        }
                        // Diff start: calculate the diff value.
                        if (cLeft != cRight && diff == 0)
                            diff = cLeft - cRight;
                    } else {
                        if (isNumericLeft != isNumericRight) {
                            // One numeric and one char.
                            if (isNumberRegion)
                                return isNumericLeft ? 1 : -1;
                            return cLeft - cRight;
                        } else {
                            // Two chars: if (number) diff don't equal 0 return it.
                            if (diff != 0)
                                return diff;
                            // Calculate chars diff.
                            diff = cLeft - cRight;
                            if (diff != 0)
                                return diff;
                            // Reset!
                            isNumberRegion = false;
                            diff = 0;
                        }
                    }
                }
                // The longer one will be put backwards.
                return (i == lenLeft && j == lenRight) ? cLeft - cRight : (i == lenLeft ? -1 : 1) ;
            }

            private boolean isNumeric(char c) {
                return c >= '0' && c <= '9';
            }

            private char getCharByCaseSensitive(char c) {
                return caseSensitive ? c : (c >= 'A' && c <= 'Z' ? (char) (c + SPAN_OF_CASES) : c);
            }
        });
        files.forEach(f -> System.out.print(f.getName() + " "));
    }
}

The output is,

1.sadasdsadsa 11.asdasdasdasdasd 12_222_222 1_22_2222 2.sadsadasdsad 22.sadasdasdsadsa 2222222222 3.asdasdsadsadsa A__01__02 A__11__23 A__1__23 A__2__02 A__3++++ B__1__02 B__22_13 adsadsadsasd02 adsadsadsasd1 adsadsadsasd10 adsadsadsasd3 
1.sadasdsadsa 1_22_2222 2.sadsadasdsad 3.asdasdsadsadsa 11.asdasdasdasdasd 12_222_222 22.sadasdasdsadsa 2222222222 A__01__02 A__1__23 A__2__02 A__3++++ A__11__23 adsadsadsasd02 adsadsadsasd1 adsadsadsasd3 adsadsadsasd10 B__1__02 B__22_13 
Process finished with exit code 0

hu. · Accepted Answer · 2011-01-09 15:17:05Z

-1

// -1: s0 < s1; 0: s0 == s1; 1: s0 > s1
static int numericCompare(const string &s0, const string &s1) {
    size_t i = 0, j = 0;
    for (; i < s0.size() && j < s1.size();) {
        string t0(1, s0[i++]);
        while (i < s0.size() && !(isdigit(t0[0]) ^ isdigit(s0[i]))) {
            t0.push_back(s0[i++]);
        }
        string t1(1, s1[j++]);
        while (j < s1.size() && !(isdigit(t1[0]) ^ isdigit(s1[j]))) {
            t1.push_back(s1[j++]);
        }
        if (isdigit(t0[0]) && isdigit(t1[0])) {
            size_t p0 = t0.find_first_not_of('0');
            size_t p1 = t1.find_first_not_of('0');
            t0 = p0 == string::npos ? "" : t0.substr(p0);
            t1 = p1 == string::npos ? "" : t1.substr(p1);
            if (t0.size() != t1.size()) {
                return t0.size() < t1.size() ? -1 : 1;
            }
        }
        if (t0 != t1) {
            return t0 < t1 ? -1 : 1;
        }
    }
    return i == s0.size() && j == s1.size() ? 0 : i != s0.size() ? 1 : -1;
}

I am not very sure if it is you want, anyway, you can have a try:-)

edited Jan 9, 2011 at 15:17

answered Jan 9, 2011 at 15:00

hu.

1311 silver badge9 bronze badges

2 Comments

Josh Kelley Over a year ago

This returns 0 for numericCompare("z01", "z1"), which doesn't seem desirable.

Nickolay Merkin Over a year ago

This algorithm uses extra memory: temporary strings. At least, you could use ranges (pairs of iterators) instead.

Collectives™ on Stack Overflow

How to implement a natural sort algorithm in c++?

9 Answers 9

3 Comments

Comments

2 Comments

1 Comment

3 Comments

Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

3 Comments

Comments

2 Comments

1 Comment

3 Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related