5

This might be heavily related to similar questions as Python 3.3: Split string and create all combinations , but I can't infer a pythonic solution out of this.

Question is:

Let there be a str such as 'hi|guys|whats|app', and I need all permutations of splitting that str by a separator. Example:

#splitting only once
['hi','guys|whats|app']
['hi|guys','whats|app']
['hi|guys|whats','app']
#splitting only twice
['hi','guys','whats|app']
['hi','guys|whats','app']
#splitting only three times
...
etc

I could write a backtracking algorithm, but does python (itertools, e.g.) offer a library that simplifies this algorithm?

Thanks in advance!!

7
  • you can do it using .split("|", n) where n is the max number of splits you need Commented Oct 13, 2021 at 12:32
  • 4
    that won't do what he wants. Commented Oct 13, 2021 at 12:32
  • 1
    @MohamedYahya yeah, but it'll also return like ['hi','guys','whats|app'] Commented Oct 13, 2021 at 12:33
  • yeah yeah, I see what he needs, thanks👍 Commented Oct 13, 2021 at 12:40
  • 1
    the list of possible splits is just the binary form of range(2**number_of_separators) you should be able to do the rest Commented Oct 13, 2021 at 13:17

5 Answers 5

1

An approach, once you have split the string is to use itertools.combinations to define the split points in the list, the other positions should be fused again.

def lst_merge(lst, positions, sep='|'):
    '''merges a list on points other than positions'''
    '''A, B, C, D and 0, 1 -> A, B, C|D'''
    a = -1
    out = []
    for b in list(positions)+[len(lst)-1]:
        out.append('|'.join(lst[a+1:b+1]))
        a = b
    return out

def split_comb(s, split=1, sep='|'):
    from itertools import combinations
    l = s.split(sep)
    return [lst_merge(l, pos, sep=sep)
            for pos in combinations(range(len(l)-1), split)]

examples

>>> split_comb('hi|guys|whats|app', 0)
[['hi|guys|whats|app']]

>>> split_comb('hi|guys|whats|app', 1)
[['hi', 'guys|whats|app'],
 ['hi|guys', 'whats|app'],
 ['hi|guys|whats', 'app']]

>>> split_comb('hi|guys|whats|app', 2)
[['hi', 'guys', 'whats|app'],
 ['hi', 'guys|whats', 'app'],
 ['hi|guys', 'whats', 'app']]

>>> split_comb('hi|guys|whats|app', 3)
[['hi', 'guys', 'whats', 'app']]

>>> split_comb('hi|guys|whats|app', 4)
[] ## impossible

rationale

ABCD -> A B C D
         0 1 2

combinations of split points: 0/1 or 0/2 or 1/2

0/1 -> merge on 2 -> A B CD
0/2 -> merge on 1 -> A BC D
1/2 -> merge on 0 -> AB C D

generic function

Here is a generic version, working like above but also taking -1 as parameter for split, in which case it will output all combinations

def lst_merge(lst, positions, sep='|'):
    a = -1
    out = []
    for b in list(positions)+[len(lst)-1]:
        out.append('|'.join(lst[a+1:b+1]))
        a = b
    return out

def split_comb(s, split=1, sep='|'):
    from itertools import combinations, chain
    
    l = s.split(sep)
    
    if split == -1:
        pos = chain.from_iterable(combinations(range(len(l)-1), r)
                                  for r in range(len(l)+1))
    else:
        pos = combinations(range(len(l)-1), split)
        
    return [lst_merge(l, pos, sep=sep)
            for pos in pos]

example:

>>> split_comb('hi|guys|whats|app', -1)
[['hi|guys|whats|app'],
 ['hi', 'guys|whats|app'],
 ['hi|guys', 'whats|app'],
 ['hi|guys|whats', 'app'],
 ['hi', 'guys', 'whats|app'],
 ['hi', 'guys|whats', 'app'],
 ['hi|guys', 'whats', 'app'],
 ['hi', 'guys', 'whats', 'app']]
Sign up to request clarification or add additional context in comments.

2 Comments

You're very welcome @glezo Out of curiosity, what is the use case for this code?
Hard one: csv delimited by pipes ("|"), having some fields pipe characters cannot be inserted into a pandas dataframe. So, i'll try to tokenize these faulty rows into all possible permutations and insert them to pandas' dataframe =D
0

Here is a recursive function I came up with:

def splitperms(string, i=0):
    if len(string) == i:
        return [[string]]
    elif string[i] == "|":
        return [*[[string[:i]] + split for split in splitperms(string[i + 1:])], *splitperms(string, i + 1)]
    else:
        return splitperms(string, i + 1)

Output:

>>> splitperms('hi|guys|whats|app')
[['hi', 'guys', 'whats', 'app'], ['hi', 'guys', 'whats|app'], ['hi', 'guys|whats', 'app'], ['hi', 'guys|whats|app'], ['hi|guys', 'whats', 'app'], ['hi|guys', 'whats|app'], ['hi|guys|whats', 'app'], ['hi|guys|whats|app']]
>>> 

5 Comments

isn't your function supposed to give only the splits for a given number of splits? this yields all
I guess that's OK
@mozway Let's see what the OP thinks about it
Yes, I realize the interpretation was subjective
Indeed, what I searched for was to filter by number of tokens in the output (but to simplify the question, I planned not to specify that and filter the whole output). Nice solution, too! =D
0

One approach using combinations and chain

from itertools import combinations, chain


def partition(alist, indices):
    # https://stackoverflow.com/a/1198876/4001592
    pairs = zip(chain([0], indices), chain(indices, [None]))
    return (alist[i:j] for i, j in pairs)


s = 'hi|guys|whats|app'
delimiter_count = s.count("|")
splits = s.split("|")


for i in range(1, delimiter_count + 1):
    print("split", i)
    for combination in combinations(range(1, delimiter_count + 1), i):
        res = ["|".join(part) for part in partition(splits, combination)]
        print(res)

Output

split 1
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
split 2
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
split 3
['hi', 'guys', 'whats', 'app']

The idea is to generate all the ways to pick (or remove) a delimiter 1, 2, 3 times and generate the partitions from there.

Comments

0

If you want all partitions, try partitions from more-itertools:

from more_itertools import partitions

s = 'hi|guys|whats|app'

for p in partitions(s.split('|')):
    print(list(map('|'.join, p)))

Output:

['hi|guys|whats|app']
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
['hi', 'guys', 'whats', 'app']

If you only want a certain number of splits, then instead of splitting at all separators and then re-joining the parts, you could just get combinations of the separator indexes and take substrings accordingly:

from itertools import combinations

s = 'hi|guys|whats|app'
splits = 2

indexes = [i for i, c in enumerate(s) if c == '|']
for I in combinations(indexes, splits):
    print([s[i+1:j] for i, j in zip([-1, *I], [*I, None])])

Output:

['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']

Comments

0

I am surprised that most answers are using combinations, where this is clearly a binary power sequence (that is, multiple binary cartesian products concatenated).

Let me elaborate: if we have n separators, we have 2**n possible strings, where each separator is either on or off. So if we map each bit of an integer sequence from 0 to 2**n to each separator (0 means we don't split, 1 means we split) we can generate the whole thing quite efficiently (without running into stack depth limits, and being able to pause and resume the generator -or even run it in parallel!- using just a simple integer to keep track of progress).

def partition(index, tokens, separator):
    def helper():
        n = index
        for token in tokens:
            yield token
            if n % 2:
                yield separator
            n //= 2
    return ''.join(helper())

def all_partitions(txt, separator):
    tokens = txt.split(separator)
    for i in range(2**(len(tokens)-1)):
        yield partition(i, tokens, separator)

for x in all_partitions('hi|guys|whats|app', '|'):
    print(x)

Explanation:

   hi|guys|whats|app
     ^    ^     ^
bit  0    1     2 (big endian representation)
   hi guys whats up
     ^    ^     ^
0 =  0    0     0


   hi|guys whats up
     ^    ^     ^
1 =  1    0     0


   hi guys|whats up
     ^    ^     ^
2 =  0    1     0


   hi|guys|whats up
     ^    ^     ^
3 =  1    1     0


   hi guys whats|up
     ^    ^     ^
4 =  0    0     1


   hi|guys whats|up
     ^    ^     ^
5 =  1    0     1


   hi guys|whats|up
     ^    ^     ^
6 =  0    1     1


   hi|guys|whats|up
     ^    ^     ^
7 =  1    1     1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.