How to count the instances of unique values in Python Dataframe

Question

I have a dataframe like below where I have 2 million rows. The sample data can be found here.

The list of matches in every row can be any number between 1 to 761. I want to count the occurrences of every number between 1 to 761 in the matches column altogether. For example, the result of the above data will be:

If a particular id is not found, then the count will be 0 in the output. I tried using for loop approach but it is quite slow.

def readData():
    df = pd.read_excel(file_path)

    pattern_match_count = [0] * 761
    for index, row in df.iterrows():
        matches = row["matches"]

        for pattern_id in range(1, 762):
            if(pattern_id in matches):
                pattern_match_count[pattern_id - 1] = pattern_match_count[pattern_id - 1] + 1

Is there any better approach with pandas to make the implementation faster?

please provide a reproducible self-sufficient input, not images — mozway
– mozway, Commented Oct 4, 2022 at 17:26

Benjamin Rio · Accepted Answer · 2022-10-05 00:01:49Z

2

You can use the .explode() method to "explode" the lists into new rows.

def readData():
    df = pd.read_excel(file_path)
    return df.loc[:, "count"].explode().value_counts()

edited Oct 5, 2022 at 0:01

answered Oct 4, 2022 at 18:02

Benjamin Rio

6763 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Andrew Over a year ago

Just a note that you can apply explode to the series directly which may save memory for a large dataframe df.loc[:, "matches"].explode().value_counts()

Benjamin Rio Over a year ago

@Andrew Thanks. Selecting then applying is better memory-wise than applying then selecting. I have edited my answered and corrected a little typo you made in your suggestion.

dermen · Accepted Answer · 2022-10-04 19:11:43Z

0

You can use collections.Counter:

df = pd.DataFrame({"matches": [[1,2,3],[1,3,3,4]]})

#df:
#        matches
#0     [1, 2, 3]
#1  [1, 3, 3, 4]

from collections import Counter

C = Counter([i for sl in df.matches for i in sl])
#C:  
#Counter({1: 2, 2: 1, 3: 3, 4: 1})

pd.DataFrame(C.items(), columns=["match_id", "counts"]) 
#   match_id  counts
#0         1       2
#1         2       1
#2         3       3
#3         4       1

If you want zeros for match_ids that aren't in any of the matches, then you can update the Counter object C:

for i in range(1,762):
    if i not in C:
        C[i] = 0
pd.DataFrame(C.items(), columns=["match_id", "counts"])

edited Oct 4, 2022 at 19:11

answered Oct 4, 2022 at 19:04

dermen

5,4024 gold badges25 silver badges37 bronze badges

Collectives™ on Stack Overflow

How to count the instances of unique values in Python Dataframe

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related