Count list occurence in list python

Question

I'd like to count how many times a big list contains elements in specific order. So for example if i have elements [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5] and i'd like to know how many times [1,2,3] are next to each other (answer is 4 in this case).

I was thinking on checking the indexes of number '3' (so currently it'd return [2,7,12,17]. Then i would iterate over that list, take elements in positions described in the list and check two positions in front of it. If they match '1' and '2' then add 1 to counter and keep looking. I believe this solution isn't really efficient and does not look nice, would there be a better solution?

Do you need to count overlapping matches as well? For example, how many times is [1, 1] contained in [1, 1, 1]? — vaultah
– vaultah, Commented Feb 25, 2017 at 14:29

vaultah · Accepted Answer · 2017-02-25 14:30:31Z

5

Here's a generalized solution that works for subsequences of any size and for elements of any type. It's also very space-efficient, as it only operates on iterators.

from itertools import islice

def count(lst, seq):
    it = zip(*(islice(lst, i, None) for i in range(len(seq))))
    seq = tuple(seq)
    return sum(x == seq for x in it)

In [4]: count(l, (1, 2, 3))
Out[4]: 4

The idea is to create a sliding window iterator of width len(seq) over lst, and count the number of tuples equal to tuple(seq). This means that count also counts overlapping matches:

In [5]: count('aaa', 'aa')
Out[5]: 2

edited Feb 25, 2017 at 14:30

answered Feb 25, 2017 at 13:16

vaultah

46.9k12 gold badges120 silver badges145 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Aran-Fey Over a year ago

Since OP's list seems to have ~6 million elements, it'd probably be a good idea to use islice.

vaultah Over a year ago

@Rawing: yes, agreed.

John Coleman · Accepted Answer · 2017-02-25 13:41:46Z

2

For lists of ints, you could convert to strings and then use the count method:

>>> x = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]
>>> y = [1,2,3]
>>> s = ',' + ','.join(str(i) for i in x) + ','
>>> t = ',' + ','.join(str(i) for i in y) + ','
>>> s.count(t)
4

If the items in the list contained strings which contain commas, this method could fail (as @schwobaseggl points out in the comments). You would need to pick a delimiter known not to occur in any of the strings, or adopt an entirely different approach which doesn't reduce to the string count method.

On Edit: I added a fix suggested by @Rawing to address a bug pointed out by @tobias_k . This turns out to be a more subtle problem than it first seems.

edited Feb 25, 2017 at 13:41

answered Feb 25, 2017 at 13:12

John Coleman

52.1k7 gold badges59 silver badges127 bronze badges

10 Comments

user2390182 Over a year ago

That turns [12, 3] and [1, 23] into the same entity.

Banana Over a year ago

Would it run in a normal time if my list has ~6 million elements?

John Coleman Over a year ago

@schwobaseggl I never claimed that my solution was a general purpose method for counting arbitrary sublists. It is a perfectly adequate quick-and-dirty approach which will cover most cases, including the example from the question. Still, you raise a good point. Care would have to be taken with the choice of delimiter in the case of lists of strings.

tobias_k Over a year ago

Even for int lists, this could fail if the search pattern is 1,2,3 and the list contains e.g. 11,2,33.

Aran-Fey Over a year ago

@JohnColeman It's not really a fatal objection. You just need to put commas at the start and the end of both strings.

|

René · Accepted Answer · 2017-02-25 13:27:21Z

1

x = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]
y = [1,2,3]

count = 0
for i in range(len(x)-len(y)):
    if x[i:i+len(y)] == y:
        count += 1
print(count)

answered Feb 25, 2017 at 13:27

René

4,9195 gold badges29 silver badges59 bronze badges

Comments

tobias_k · Accepted Answer · 2017-02-25 13:16:16Z

0

You could iterate the list and compare sublists:

In [1]: lst = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]
In [2]: sub = [1,2,3]
In [3]: [i for i, _ in enumerate(lst) if lst[i:i+len(sub)] == sub]
Out[3]: [0, 5, 10, 15]

Note, however, that on a very large list and sublist, this is pretty wasteful, as it creates very many slices of the original list to compare against the sublist. In a slightly longer version, you could use all to compare each of the relevant positions of the list with those of the sublist:

In [5]: [i for i, _ in enumerate(lst) if all(lst[i+k] == e for k, e in enumerate(sub))]
Out[5]: [0, 5, 10, 15]

answered Feb 25, 2017 at 13:16

tobias_k

83.1k12 gold badges130 silver badges186 bronze badges

Comments

Rob Grossman · Accepted Answer · 2017-02-25 13:18:12Z

This strikes me as the longest common subsequence problem repeated every time until the sequence returned is an empty list.

I think that the best that you can do in this case for an efficient algorithm is O(n*m) where n is the number of elements in your big list and m is the number of elements in your small list. You of course would have to have an extra step of removing the small sequence from the big sequence and repeating the process.

Here's the algorithm:

Find lcs(bigList, smallList)
Remove the first occurrence of the smallList from the bigList
Repeat until lcs is an empty list
Return the number of iterations

Here is an implementation of lcs that I wrote in python:

def lcs(first, second):
    results = dict()
    return lcs_mem(first, second, results)

def lcs_mem(first, second, results):
    key = ""
    if first > second:
        key = first + "," + second
    else:
        key = second + "," + first

    if len(first) == 0 or len(second) == 0:
        return ''
    elif key in results:
        return results[key]
    elif first[-1] == second[-1]:
        result = lcs(first[:-1], second[:-1]) + first[-1]
        results[key] = result
        return result
    else:
        lcsLeft = lcs(first[:-1], second)
        lcsRight = lcs(first, second[:-1])

        if len(lcsLeft) > len(lcsRight):
            return lcsLeft
        else:
            return lcsRight

def main():
    pass


if __name__ == '__main__':
    main()

Feel free to modify it to the above algorithm.

Ujjaval Moradiya · Accepted Answer · 2017-02-25 13:32:47Z

0

One can define the efficiency of solution from the complexity. Search more about complexity of algorithm in google.

And in your case, complexity is 2n where n is number of elements.

Here is the solution with the complexity n, cause it traverses the list only once, i.e. n number of times.

def IsSameError(x,y):
    if (len(x) != len(y)):
        return False

    i = 0
    while (i < len(y)):
        if(x[i] != y[i]):
            return False
        i += 1

    return True

x = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
y = [1, 2, 3]

xLength = len(x)
yLength = len(y)
cnt = 0
answer = []
while (cnt+3 < xLength):
    if(IsSameError([x[cnt], x[cnt+1], x[cnt+2]], y)):
        answer.append(x[cnt])
        answer.append(x[cnt+1])
        answer.append(x[cnt + 2])
        cnt = cnt + 3
    else:
        cnt = cnt + 1


print answer

answered Feb 25, 2017 at 13:32

Ujjaval Moradiya

3261 silver badge12 bronze badges

2 Comments

reticentroot Over a year ago

It might have a lower complexity but it is hard to say that it is better without bench marking it. My reasoning being in Python for loops run at c level speeds while While loops do not. Which means looping with a for loop in Python is faster. Likewise you are creating memory allocation which with your appends, which is also a heavy process. As such the trade in complexity may not be worth it in this scenario.

Ujjaval Moradiya Over a year ago

I don't know the difference between for loop and while loop. Can you please give the reason why it is slower. And if you are comparing my answer with other answer then I can skip the append part and we can also replace while loop with for loop. But yes great knowledge.

Collectives™ on Stack Overflow

Count list occurence in list python

6 Answers 6

2 Comments

10 Comments

Comments

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

10 Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related