Detect and count pattern in two lists in a dataframe

Question

I need to detect and count how many patterns from UF_med variable are in UF_cadastral variable.

That is my dataset:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']]}

df = pd.DataFrame(df)
df.head()

Although I need to count the patterns, I tried at least detect one pattern. However, the code only detect the first pattern of UF_med variable. I used that code:

df['Detect_Municipio'] = df.apply(lambda x: x['UF_med'] in x['UF_cadastral'], axis=1)

The result should be like that:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']],
        'Detect_Municipio':[2,1,2]}

df = pd.DataFrame(df)

df = {'id': [1,2,3], 'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']], 'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']], 'Detect_Municipio':[2,1,2]} df = pd.DataFrame(df) — André Leonardo Pruner da Silva
– André Leonardo Pruner da Silva, Commented Aug 21, 2020 at 20:41

wwnde · Accepted Answer · 2020-08-21 20:57:02Z

1

  df['check']=[list(set(x).intersection(set(y)))\
  for x, y in zip(df.UF_med, df.UF_cadastral)]

  df['count']=df.check.str.len()





   id          UF_med   UF_cadastral     check     count
0   1      [SP, SC, PA]     [SP, PA]  [SP, PA]      2
1   2              [SP]         [SP]      [SP]      1
2   3  [AM, RJ, PA, RS]     [AM, RS]  [AM, RS]      2

Or just replacing list by len as follow:

df['amount']=[len(set(x).intersection(set(y))) for x, y in zip(df.UF_med, df.UF_cadastral)]

Result would be:

id            UF_med UF_cadastral  amount
0   1      [SP, SC, PA]     [SP, PA]       2
1   2              [SP]         [SP]       1
2   3  [AM, RJ, PA, RS]     [AM, RS]       2

edited Aug 21, 2020 at 20:57

answered Aug 21, 2020 at 20:46

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David · Accepted Answer · 2020-08-21 20:54:03Z

0

Try changing

df['Detect_Municipio'] = df.apply(lambda x: x['UF_med'] in x['UF_cadastral'], axis=1)

to

df['Detect_Municipio'] = df.apply(
    lambda x: len(set(x['UF_med']) & set(x['UF_cadastral'])), axis=1)

The elements of your table are lists so you can use list intersection to get equivalent elements in those lists. Len gets you to the number of matches.

answered Aug 21, 2020 at 20:54

David

405 bronze badges

Comments

Elan-R · Accepted Answer · 2020-08-21 21:01:41Z

I don't know exactly what you're trying to accomplish, but here are my best guesses:

Find overlapping lists in the two lists:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']]}

output = [item for item in df["UF_med"] if item in df["UF_cadastral"]]
#output is [['SP']]

Find overlapping strings in all lists:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']]}

uf_med = {item for sublist in df["UF_med"] for item in sublist}
uf_cadastral = {item for sublist in df["UF_cadastral"] for item in sublist}
output = [item for item in uf_med if item in uf_cadastral]
#output is ['AM', 'PA', 'RS', 'SP']

Find overlapping strings in same-index lists:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']]}

output = [{item for item in list1 if item in list2} for list1, list2 in zip(df["UF_med"], df["UF_cadastral"])]
#output is [{'PA', 'SP'}, {'SP'}, {'AM', 'RS'}]

Collectives™ on Stack Overflow

Detect and count pattern in two lists in a dataframe

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related