1

The task is to obtain a unique list of substrings in python.

I am currently using the breakup of the problem into 2 parts: obtain a list of all the substrings, followed by obtaining unique substrings.

I am using the below code:

substrings=[]
for i in range(0,len(inputstring)+1):
    for j in range(i+1,len(inputstring)+1):
        substr=inputstring[i:j]
        substrings.append(substr)
uniq=[]
for ss in substrings:
    if ss not in uniq:
        uniq.append(ss)

Is there a faster way to solve this problem or a so-called python way of doing it in a more flexible way?

a simple example string being: "aabaa", possible substrings are [a,a,b,a,a,aa,ab,ba,aa,aab,aba,baa,aaba,abaa,aabaa], unique substring which is desired at the end [a,b,aa,ab,ba,aab,aba,baa,aaba,abaa,aabaa]

5
  • start with posting inputstring value Commented Jul 24, 2019 at 14:13
  • What are example inputs and outputs? Commented Jul 24, 2019 at 14:13
  • 3
    This is highly Google-able Commented Jul 24, 2019 at 14:14
  • Make uniq a set which will make the if ss not in uniq execute much faster. Also see Getting all combinations of a string and its substrings. Commented Jul 24, 2019 at 14:14
  • @Austin, I just edited the question to contain sample input and expected results Commented Jul 24, 2019 at 14:34

2 Answers 2

1

Use Itertools and Set. Similar to the answer of Edwin but with Itertools, and in one line.

import itertools

uniq=list(set([inputstring[x:y] for x, y in itertools.combinations(
            range(len(inputstring) + 1), r = 2)]))

Basically you use itertools to first find all combinations, then set to find unique elements, then cast to list.

Code for combinations taken from https://www.geeksforgeeks.org/python-get-all-substrings-of-given-string/

Edit for a clearer explanation: First, use combinations to get all pair of indexes corresponding to substrings. The trick here is that itertools.combinations starts with all (0,X) pairs, and then (1,X) pairs, etc. Since we are using combinations and not permutations, we eliminate thus reverse substrings such as (1,0) since they will have been seen in the (0,X) enumerations.

Then simply use these with a list comprehension to get all substrings, use a set to find unique elements, and cast to a list.

Hope that helps

Sign up to request clarification or add additional context in comments.

3 Comments

Yes, but if you use itertools with permutations it will also return subsets like (1,0), which is undesirable in this case. Since in combinations order doesn't matter, and due to the way itertools.combinations iterates (starts with all (0,X), then (1,X), etc), combinations here will correctly return all substring
combination order matters as the permuted combination need not be the substring
I don't understand that sentence. I'll edit to explain the code more maybe that will help
0

Use a set instead of a list for the second part. Finding something in a list costs O(n) while in a set it costs O(1) and you don't have to check if its new. Sets won't add something if it is already in the list.

uniq=set()
for i in range(0,len(inputstring)+1):
    for j in range(i+1,len(inputstring)+1):
        substr=inputstring[i:j]
        uniq.add(substr)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.