Find longest unique substring in string python

Question

I am trying that age old question (there are multitudes of versions around) of finding the longest substring of a string which doesn't contain repeated characters. I can't work out why my attempt doesn't work properly:

def findLongest(inputStr):
    resultSet = []
    substr = []

    for c in inputStr:
        print ("c: ", c)
        if substr == []:
            substr.append([c])
            continue

        print(substr)
        for str in substr:
            print ("c: ",c," - str: ",str,"\n")
            if c in str:
                resultSet.append(str)
                substr.remove(str)
            else:
                str.append(c)
        substr.append([c])



    print("Result set:")
    print(resultSet)
    return max(resultSet, key=len)

print (findLongest("pwwkewambb"))

When my output gets to the second 'w', it doesn't iterate over all the substr elements. I think I've done something silly, but I can't see what it is so some guidance would be appreciated! I feel like I'm going to kick myself at the answer...

The beginning of my output:

c:  p
c:  w
[['p']]
c:  w  - str:  ['p']

c:  w
[['p', 'w'], ['w']]
c:  w  - str:  ['p', 'w'] # I expect the next line to say c: w - str: ['w']

c:  k
[['w'], ['w']] # it is like the w was ignored as it is here
c:  k  - str:  ['w']

c:  k  - str:  ['w']
...

EDIT:

I replaced the for loop with

for idx, str in enumerate(substr):
    print ("c: ",c," - str: ",str,"\n")
    if c in str:
        resultSet.append(str)
        substr[idx] = []
    else:
        str.append(c)

and it produces the correct result. The only thing is that the empty element arrays get set with the next character. It seems a bit pointless; there must be a better way.

My expected output is kewamb.

e.g.

c:  p
c:  w
[['p']]
c:  w  - str:  ['p']

c:  w
[['p', 'w'], ['w']]
c:  w  - str:  ['p', 'w']

c:  w  - str:  ['w']

c:  k
[[], [], ['w']]
c:  k  - str:  []

c:  k  - str:  []

c:  k  - str:  ['w']

c:  e
[['k'], ['k'], ['w', 'k'], ['k']]
c:  e  - str:  ['k']

c:  e  - str:  ['k']

c:  e  - str:  ['w', 'k']

c:  e  - str:  ['k']
...

ah really? didn't know that. I tried using str = [] before and that didn't work so starting using remove — dgBP
– dgBP, Commented Oct 6, 2017 at 20:30
am I thinking about this the wrong way - is there a more intuitive solution? — dgBP
– dgBP, Commented Oct 6, 2017 at 20:31
Ah, yes. I'll edit it. I tried out something which got the expected output, but isn't a perfect solution. — dgBP
– dgBP, Commented Oct 6, 2017 at 20:45

salparadise · Accepted Answer · 2019-01-07 18:25:46Z

2

Edit, per comment by @seymour on incorrect responses:

def find_longest(s):
    _longest = set()
    def longest(x):
         if x in _longest:
             _longest.clear()
             return False
         _longest.add(x)
         return True
    return ''.join(max((list(g) for _, g in groupby(s, key=longest)), key=len))

And test:

In [101]: assert find_longest('pwwkewambb') == 'kewamb'

In [102]: assert find_longest('abcabcbb') == 'abc'

In [103]: assert find_longest('abczxyabczxya') == 'abczxy'

Old answer:

from itertools import groupby

s = set() ## for mutable access

''.join(max((list(g) for _, g in groupby('pwwkewambb', key=lambda x: not ((s and x == s.pop()) or s.add(x)))), key=len))
'kewamb'

groupby returns an iterator grouped based on the function provided in the key argument, which by default is lambda x: x. Instead of the default we are utilizing some state by using a mutable structure (which could have been done a more intuitive way if using a normal function)

lambda x: not ((s and x == s.pop()) or s.add(x))

What is happening here is since I can't reassign a global assignment in a lambda (again I can do this, using a proper function), I just created a global mutable structure that I can add/remove. The key (no pun) is that I only keep elements that I need by using a short circuit to add/remove items as needed.

max and len are fairly self explanatory, to get the longest list produced by groupby

Another version without the mutable global structure business:

def longest(x):
     if hasattr(longest, 'last'):
         result = not (longest.last == x)
         longest.last = x
         return result
     longest.last = x
     return True


''.join(max((list(g) for _, g in groupby('pwwkewambb', key=longest)), key=len))
'kewamb'

edited Jan 7, 2019 at 18:25

answered Oct 6, 2017 at 20:58

salparadise

5,8751 gold badge28 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Jean-François Fabre Over a year ago

I wouldn't call that intuitive but that's very good. Care to explain to the world how it works?

dgBP Over a year ago

This is some kind of genius. An explanation would make my life easier, too!

Jean-François Fabre Over a year ago

It's using groupby to create groups with characters not similar, using side effect in the key to add or remove from set, and use max to compute max string. Same principle as my solution but one-liner & comprehensions. Really nice.

Jean-François Fabre Over a year ago

I admit I had to google the question to see if it wasn't plagiarism because 1) it's very good 2) I don't know that user whereas I'm here like a lot and 3) no explanation. Didn't find that anywhere else. Please explain more, you'll get more upvotes.

salparadise Over a year ago

@Jean-FrançoisFabre explanation coming, I was bored at work. I would admit plagiarism but I can claim this one as my own, borrowing heavily from things I seen of course.

|

Jean-François Fabre · Accepted Answer · 2017-10-06 21:16:25Z

2

Not sure what is wrong in your attempt, but it's complex and in:

    for str in substr:
        print ("c: ",c," - str: ",str,"\n")
        if c in str:
            resultSet.append(str)
            substr.remove(str)

you're removing elements from a list while iterating on it: don't do that, it gives unexpected results.

Anyway, my solution, not sure it's intuitive, but it's probably simpler & shorter:

slice the string with an increasing index
for each slice, create a set and store letters until you reach the end of the string or a letter is already in the set. Your index is the max length
compute the max of this length for every iteration & store the corresponding string

Code:

def findLongest(s):
    maxlen = 0
    longest = ""
    for i in range(0,len(s)):
        subs = s[i:]
        chars = set()
        for j,c in enumerate(subs):
            if c in chars:
                break
            else:
                chars.add(c)
        else:
            # add 1 when end of string is reached (no break)
            # handles the case where the longest string is at the end
            j+=1
        if j>maxlen:
            maxlen=j
            longest=s[i:i+j]
    return longest

print(findLongest("pwwkewambb"))

result:

kewamb

edited Oct 6, 2017 at 21:16

answered Oct 6, 2017 at 20:56

Jean-François Fabre♦

141k24 gold badges179 silver badges246 bronze badges

4 Comments

dgBP Over a year ago

Good answer, but the case "bbbb" doesn't produce the expected "b"

Jean-François Fabre Over a year ago

yes, fixed (edge cases / index values to tune): now it works.

dgBP Over a year ago

I just realised this has a "for else" which I don't remember ever coming across. Learnt something else new.

Jean-François Fabre Over a year ago

that's a nice feature to avoid the use of a cumbersome flag.

M. Flanzer · Accepted Answer · 2018-12-16 23:25:31Z

Depends on your definition of repeated characters: if you mean consecutive, then the approved solution is slick, but not of characters appearing more than once (e.g.: pwwkewabmb -> 'kewabmb' ).

Here's what I came up with (Python 2):

def longest(word):
    begin = 0
    end = 0
    longest = (0,0)
    for i in xrange(len(word)):
        try:
            j = word.index(word[i],begin,end)
            # longest?
            if end-begin >= longest[1]-longest[0]:
                longest = (begin,end)
            begin = j+1
            if begin==end:
                end += 1
        except:
            end = i+1
    end=i+1
    if end-begin >= longest[1]-longest[0]:
        longest = (begin,end)
    return word[slice(*longest)]

Thus

>>> print longest('pwwkewabmb')
kewabm
>>> print longest('pwwkewambb')
kewamb
>>> print longest('bbbb')
b

gmagno · Accepted Answer · 2021-03-20 21:09:44Z

1

My 2-cents:

from collections import Counter

def longest_unique_substr(s: str) -> str:

    # get all substr-ings from s, starting with the longest one
    for substr_len in range(len(s), 0, -1):
        for substr_start_index in range(0, len(s) - substr_len + 1):
            substr = s[substr_start_index : substr_start_index + substr_len]

            # check if all substr characters are unique
            c = Counter(substr)
            if all(v == 1 for v in c.values()):
                return substr

    # ensure empty string input returns ""
    return ""

Run:

In : longest_unique_substr('pwwkewambb')
Out: 'kewamb'

edited Mar 20, 2021 at 21:09

answered Mar 20, 2021 at 20:54

gmagno

2,0581 gold badge25 silver badges46 bronze badges

Comments

Tejaswi devina · Accepted Answer · 2020-04-23 02:38:21Z

0

s=input()
ma=0
n=len(s)
l=[]
a=[]
d={}
st=0;i=0
while i<n:
    if s[i] not in d:
        d[s[i]]=i
        l.append(s[i])
    else:
        t=d[s[i]]
        d[s[i]]=i
        s=s[t+1:]
        d={}
        n=len(s)
        if len(l)>=3:
            a.append(l)
            ma=max(ma,len(l))
        l=[];i=-1
    i=i+1
if len(l)!=0 and len(l)>=3:
    a.append(l)
    ma=max(ma,len(l))
if len(a)==0:
    print("-1")
else:
    for i in a:
        if len(i)==ma:
            for j in i:
                print(j,end="")
            break

answered Apr 23, 2020 at 2:38

Tejaswi devina

1

1 Comment

NaN Over a year ago

Can you provide an example of inputs and outputs so that the efficacy of your code can be assessed.

Collectives™ on Stack Overflow

Find longest unique substring in string python

5 Answers 5

7 Comments

4 Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

7 Comments

4 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related