0

I need to detect and count how many patterns from UF_med variable are in UF_cadastral variable.

That is my dataset:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']]}

df = pd.DataFrame(df)
df.head()

Although I need to count the patterns, I tried at least detect one pattern. However, the code only detect the first pattern of UF_med variable. I used that code:

df['Detect_Municipio'] = df.apply(lambda x: x['UF_med'] in x['UF_cadastral'], axis=1)

The result should be like that:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']],
        'Detect_Municipio':[2,1,2]}

df = pd.DataFrame(df)

2
  • What should be the output ? Commented Aug 21, 2020 at 20:30
  • df = {'id': [1,2,3], 'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']], 'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']], 'Detect_Municipio':[2,1,2]} df = pd.DataFrame(df) Commented Aug 21, 2020 at 20:41

3 Answers 3

1
  df['check']=[list(set(x).intersection(set(y)))\
  for x, y in zip(df.UF_med, df.UF_cadastral)]

  df['count']=df.check.str.len()





   id          UF_med   UF_cadastral     check     count
0   1      [SP, SC, PA]     [SP, PA]  [SP, PA]      2
1   2              [SP]         [SP]      [SP]      1
2   3  [AM, RJ, PA, RS]     [AM, RS]  [AM, RS]      2

Or just replacing list by len as follow:

df['amount']=[len(set(x).intersection(set(y))) for x, y in zip(df.UF_med, df.UF_cadastral)]

Result would be:

id            UF_med UF_cadastral  amount
0   1      [SP, SC, PA]     [SP, PA]       2
1   2              [SP]         [SP]       1
2   3  [AM, RJ, PA, RS]     [AM, RS]       2
Sign up to request clarification or add additional context in comments.

Comments

0

Try changing

df['Detect_Municipio'] = df.apply(lambda x: x['UF_med'] in x['UF_cadastral'], axis=1)

to

df['Detect_Municipio'] = df.apply(
    lambda x: len(set(x['UF_med']) & set(x['UF_cadastral'])), axis=1)

The elements of your table are lists so you can use list intersection to get equivalent elements in those lists. Len gets you to the number of matches.

Comments

0

I don't know exactly what you're trying to accomplish, but here are my best guesses:

Find overlapping lists in the two lists:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']]}

output = [item for item in df["UF_med"] if item in df["UF_cadastral"]]
#output is [['SP']]

Find overlapping strings in all lists:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']]}

uf_med = {item for sublist in df["UF_med"] for item in sublist}
uf_cadastral = {item for sublist in df["UF_cadastral"] for item in sublist}
output = [item for item in uf_med if item in uf_cadastral]
#output is ['AM', 'PA', 'RS', 'SP']

Find overlapping strings in same-index lists:

df = {'id': [1,2,3],
        'UF_med':[['SP', 'SC', 'PA'], ['SP'], ['AM', 'RJ', 'PA', 'RS']],
        'UF_cadastral': [['SP', 'PA'], ['SP'], ['AM', 'RS']]}

output = [{item for item in list1 if item in list2} for list1, list2 in zip(df["UF_med"], df["UF_cadastral"])]
#output is [{'PA', 'SP'}, {'SP'}, {'AM', 'RS'}]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.