5

I am trying to come up with a way to generate all possible unique strings from an alphabet of 20 characters where the order within the string doesn't matter, and the length of the string can vary. So, for instance, for a string of length 3, the possible strings would be AAA, AAB, AAC, etc., but would not include BAA or CAA. I figured out a way using itertools.product(), but it is very computationally expensive. The easiest way to do this is simply using nested for loops. For instance, to generate all strings of length four:

alphabet = ["A","C","D","E","F","G","H","I","K","L",
            "M","N","P","Q","R","S","T","V","W","Y"]
combos = []
for a in range(len(alphabet)):
    for b in range(a,len(alphabet)):
        for c in range(b,len(alphabet)):
            for d in range(c,len(alphabet)):
                combos.append(alphabet[a] + alphabet[b] + alphabet[c] + alphabet[d])

Now, this can easily be done for any length string by changing the number of for loops. Given the for loop sequence itself is quite predictable, is there are way to simplify this code instead of having if length == 3 run three for loops and if length == 4 run four loops instead? The only way I can think to do it right now is a bunch of if-elif statements:

if length == 3:
    for a in range(len(alphabet)):
        for b in range(a,len(alphabet)):
            for c in range(b,len(alphabet)):
                combos.append(alphabet[a] + alphabet[b] + alphabet[c])
elif length == 4:
    for a in range(len(alphabet)):
        for b in range(a,len(alphabet)):
            for c in range(b,len(alphabet)):
                for d in range(c,len(alphabet)):
                    combos.append(alphabet[a] + alphabet[b] + alphabet[c] + alphabet[d])

Is there any easier way than just covering a bunch of possible values of length?

2
  • 3
    Can you say more about your attempted/failed solution using itertools.product? Your way should be far more computationally expensive. Commented Aug 17, 2015 at 21:19
  • 1
    @Two-BitAlchemist: no, the OP's code is better, because it generates only the ones he needs. Using product, in the 4-letter case, you'd be throwing away 151145/160000 of the results. Commented Aug 17, 2015 at 21:35

2 Answers 2

3

IIUC, you can simply use itertools.combinations_with_replacement.

>>> list(map(''.join, combinations_with_replacement(["a","b","c"],2)))
['aa', 'ab', 'ac', 'bb', 'bc', 'cc']
>>> list(map(''.join, combinations_with_replacement(["a","b","c"],3)))
['aaa', 'aab', 'aac', 'abb', 'abc', 'acc', 'bbb', 'bbc', 'bcc', 'ccc']
>>> list(map(''.join, combinations_with_replacement(alphabet,4))) == orig(alphabet)
True

(where orig is simply your original code wrapped into a function).

Sign up to request clarification or add additional context in comments.

5 Comments

was just going to answer the same, product would give completely different output
@PadraicCunningham: then I'm not sure I understand your comment question to the OP -- the OP's code is more efficient than product because he generates only the ones he needs, rather than looking at all of them and filtering away the ones he doesn't want.
I did not look at the OP's code initially, I presumed they wanted the product considering they were using product
I'm afraid I did the same thing. I made assumptions based on the question text and not the code.
This is exactly what I was looking for, thank you. I timed the different implementations to see how they compared with strings of length 6: itertools.product(): 72.5235 s, manual nest: 0.1549 s, itertools.combinations_with_replacement(): 0.1536 s
1
  1. the code for itertools.product does exactly what you want and is much more efficient that nested loops

  2. i suspect that what you really want is itertools.combinations_with_replacement

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.