Find element in pandas dataframe

Question

I have a table:

    Name1   Name2           Name3
0     ABC     FGD             NNY
1  ABSTRE      PC  ABSTRE Tree in
2       P  ABSTRE             NNY
3     JJJ     FGD             NNY
4  ABSFRE      PC          ABSKRE

I need get these info:

['ABSTRE', 'ABSFRE', 'ABSTRE', 'ABSKRE']

So its means that code has the same 3 letters and the same length.

The same 3 letters: ABS
Length: 6

I need get all codes from table. I think it should be something like these:

t='^[A-Z0-9]{3,10}?$'
for i in df.items():
    l=df[df[i].str.contains(t)]

Could you please help me with this?

Is ABC correct ? It is not typo and need ABS ?

jezrael
– jezrael

2018-08-20 07:37:51 +00:00
Commented Aug 20, 2018 at 7:37 — jezrael
– jezrael, Commented Aug 20, 2018 at 7:37

Shaido · Accepted Answer · 2018-08-20 07:50:05Z

1

If the "ABS" letters are always in the beginning of the word, you can do the following:

df = df.stack()
values = df.loc[df.str.contains("^ABS[0-9a-zA-Z]{3}$")].tolist()

This will match all words of length 6 that starts with "ABS". Result of print(values):

['ABSTRE', 'ABSTRE', 'ABSFRE', 'ABSKRE']

answered Aug 20, 2018 at 7:50

Shaido

28.6k26 gold badges76 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2018-08-20 07:56:06Z

I believe need extract codes with 3 and 6 lengths:

print (df)
    Name1   Name2           Name3
0     ABS     FGD             NNY <- changed ABC to ABS
1  ABSTRE      PC  ABSTRE Tree in
2       P  ABSTRE             NNY
3     JJJ     FGD             NNY
4  ABSFRE      PC          ABSKRE

t1 = '^([A-Z0-9]{3})?$'
t2 = '^([A-Z0-9]{6})?$'

s = df.filter(like='Name').stack()

s1 = s.str.extract(t1, expand=False).dropna()
print (s1)
0  Name1    ABS
   Name2    FGD
   Name3    NNY
2  Name3    NNY
3  Name1    JJJ
   Name2    FGD
   Name3    NNY
dtype: object

s2 = s.str.extract(t2, expand=False).dropna()
print (s2)
1  Name1    ABSTRE
2  Name2    ABSTRE
4  Name1    ABSFRE
   Name3    ABSKRE
dtype: object

And then filter second Series s2 by first 3 values and boolean indexing:

L = s2[s2.str[:3].isin(s1)].tolist()
print (L)
['ABSTRE', 'ABSTRE', 'ABSFRE', 'ABSKRE']

If want check by all substrings:

pat = r'\b{}\b'.format('|'.join(s1))
L = s2[s2.str.contains(pat)].tolist()
print (L)
['ABSTRE', 'ABSTRE', 'ABSFRE', 'ABSKRE']

If want extract all values starting by ABC with length 6 use extract:

t = "^(ABS[0-9a-zA-Z]{3})$"
L = df.filter(like='Name').stack().str.extract(t, expand=False).dropna().tolist()
print (L)

['ABSTRE', 'ABSTRE', 'ABSFRE', 'ABSKRE']

or another @Shaido answer.

Collectives™ on Stack Overflow

Find element in pandas dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related