Ordering a string by the count of substrings?

Question

I have long list of numbers like this:

1234-212-22-11153782-0114232192380
8807698823332-6756-234-14-09867378
45323-14-221-238372635363-43676256
62736373-9983-23-234-8863345637388

. . . . 
. . . .

I would like to do two things:

1) order this list by the count of digits within each segment, the output should be like this:

22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388

2) find the count of sub strings in each line, the output should be:

2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13

In this example the first, second and third segments of each number has same numbers, but they could be different.

are fields of identical length possible? How should these be arranged? — RudiC
– RudiC, Commented Oct 3, 2018 at 12:36
This looks like homework. Are you allowed to use something besides bash, e.g. python? — Hermann
– Hermann, Commented Oct 3, 2018 at 12:38
You mention Linux; can we assume a GNU/Linux environment for solutions? — Jeff Schaller
– Jeff Schaller ♦, Commented Oct 3, 2018 at 12:38
Not sure why this would be homework. Could be log output from a game or a piece of custom hardware or anything really. — pipe
– pipe, Commented Oct 3, 2018 at 13:28
The original answer was about how to solve this in bash (only) – hence I assumed the origin of the restriction. — Hermann
– Hermann, Commented Oct 3, 2018 at 14:29

steeldriver · Accepted Answer · 2018-10-04 00:09:31Z

7

How about

$ perl -F'-' -lpe '$_ = join "-", sort { length $a <=> length $b } @F' file
22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388

and

$ perl -F'-' -lpe '$_ = join "-", sort { $a <=> $b } map length, @F' file
2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13

Thanks to Stéphane Chazelas for suggested improvements

edited Oct 4, 2018 at 0:09

answered Oct 3, 2018 at 13:01

steeldriver

83.9k12 gold badges124 silver badges175 bronze badges

1

Better to do the map before the sort (perl -F- -lpe '$_=join "-", sort {$a<=>$b} map length, @F')

Stéphane Chazelas
– Stéphane Chazelas

2018-10-03 16:40:26 +00:00
Commented Oct 3, 2018 at 16:40
@StéphaneChazelas thanks as always for the helpful improvements

steeldriver
– steeldriver

2018-10-04 00:10:16 +00:00
Commented Oct 4, 2018 at 0:10

Add a comment |

Jeff Schaller · Accepted Answer · 2018-10-03 13:05:20Z

GNU awk can sort, so the trickiest part is deciding how to separate the two desired outputs; this script generates both results, and you can decide if you'd like them somewhere other than hard-coded output files:

function compare_length(i1, v1, i2, v2) {
  return (length(v1) - length(v2));
}

BEGIN {
  PROCINFO["sorted_in"]="compare_length"
  FS="-"
}

{
        split($0, elements);
        asort(elements, sorted_elements, "compare_length");
        reordered="";
        lengths="";
        for (element in sorted_elements) {
                reordered=(reordered == "" ? "" : reordered FS) sorted_elements[element];
                lengths=(lengths == "" ? "" : lengths FS) length(sorted_elements[element]);
        }
        print reordered > "reordered.out";
        print lengths > "lengths.out";
}

terdon · Accepted Answer · 2018-10-03 13:16:12Z

How far would this get you:

awk -F- '               # set "-" as the field separator
{
 for (i=1; i<=NF; i++){
   L    = length($i)    # for every single field, calc its length
   T[L] = $i            # and populate the T array with length as index
   if (L>MX){ MX = L }  # keep max length
 }                        
 $0 = ""                # empty line
 for (i=1; i<=MX; i++){
  if (T[i]){
   $0 = $0 OFS T[i]     # append each non-zero T element to the line, separated by "-"
   C  = C OFS i         # keep the field lengths in separate variable C
  }
 }
 print substr ($0, 2) "\t"  substr (C, 2)    # print the line and the field lengths, eliminating each first char
 C = MX = ""                                 # reset working variables
 split ("", T)                               # delete T array
}
' OFS=- file
22-212-1234-11153782-0114232192380  2-3-4-8-13
14-234-6756-09867378-8807698823332  2-3-4-8-13
14-221-45323-43676256-238372635363  2-3-5-8-12
23-234-9983-62736373-8863345637388  2-3-4-8-13

You may want to split the printout into two result files.

user1717828 · Accepted Answer · 2018-10-03 19:04:02Z

Is Python OK? If so put your strings in numbers.txt and run:

with open('numbers.txt') as f:
    for string in f.read().splitlines():
        print('-'.join(sorted(string.split('-'), key=len)))

22-212-1234-11153782-0114232192380
14-234-6756-09867378-8807698823332
14-221-45323-43676256-238372635363
23-234-9983-62736373-8863345637388

The magic here is the key parameter of sorted taking the length function. For the counting use case, do

with open('numbers.txt') as f:
    for string in f.read().splitlines():
        print('-'.join([str(len(segment))
        for segment in sorted(string.split('-'), key=len)]))

2-3-4-8-13
2-3-4-8-13
2-3-5-8-12
2-3-4-8-13

Where we've run the exact same code but now get the length of each segment and turn that length into a string for concatenation.

glenn jackman · Accepted Answer · 2018-10-03 16:34:55Z

-1

With a bash pipeline, you could write

while IFS=- read -ra words; do 
    for word in "${words[@]}"; do printf "%d\t%s\n" "${#word}" "$word"; done | 
    sort -k1,1n | 
    cut -f2 | 
    paste -sd-
done < file

answered Oct 3, 2018 at 16:34

glenn jackman

88.5k16 gold badges124 silver badges179 bronze badges

Add a comment |

Stack Exchange Network

Ordering a string by the count of substrings?

5 Answers 5

You must log in to answer this question.

Hot Network Questions

Ordering a string by the count of substrings?

5 Answers 5

You must log in to answer this question.

Related

Hot Network Questions