1

I have a table:

    Name1   Name2           Name3
0     ABC     FGD             NNY
1  ABSTRE      PC  ABSTRE Tree in
2       P  ABSTRE             NNY
3     JJJ     FGD             NNY
4  ABSFRE      PC          ABSKRE

I need get these info:

['ABSTRE', 'ABSFRE', 'ABSTRE', 'ABSKRE']

So its means that code has the same 3 letters and the same length.

  • The same 3 letters: ABS
  • Length: 6

I need get all codes from table. I think it should be something like these:

t='^[A-Z0-9]{3,10}?$'
for i in df.items():
    l=df[df[i].str.contains(t)]

Could you please help me with this?

1
  • Is ABC correct ? It is not typo and need ABS ? Commented Aug 20, 2018 at 7:37

2 Answers 2

1

If the "ABS" letters are always in the beginning of the word, you can do the following:

df = df.stack()
values = df.loc[df.str.contains("^ABS[0-9a-zA-Z]{3}$")].tolist()

This will match all words of length 6 that starts with "ABS". Result of print(values):

['ABSTRE', 'ABSTRE', 'ABSFRE', 'ABSKRE']
Sign up to request clarification or add additional context in comments.

Comments

1

I believe need extract codes with 3 and 6 lengths:

print (df)
    Name1   Name2           Name3
0     ABS     FGD             NNY <- changed ABC to ABS
1  ABSTRE      PC  ABSTRE Tree in
2       P  ABSTRE             NNY
3     JJJ     FGD             NNY
4  ABSFRE      PC          ABSKRE

t1 = '^([A-Z0-9]{3})?$'
t2 = '^([A-Z0-9]{6})?$'

s = df.filter(like='Name').stack()

s1 = s.str.extract(t1, expand=False).dropna()
print (s1)
0  Name1    ABS
   Name2    FGD
   Name3    NNY
2  Name3    NNY
3  Name1    JJJ
   Name2    FGD
   Name3    NNY
dtype: object

s2 = s.str.extract(t2, expand=False).dropna()
print (s2)
1  Name1    ABSTRE
2  Name2    ABSTRE
4  Name1    ABSFRE
   Name3    ABSKRE
dtype: object

And then filter second Series s2 by first 3 values and boolean indexing:

L = s2[s2.str[:3].isin(s1)].tolist()
print (L)
['ABSTRE', 'ABSTRE', 'ABSFRE', 'ABSKRE']

If want check by all substrings:

pat = r'\b{}\b'.format('|'.join(s1))
L = s2[s2.str.contains(pat)].tolist()
print (L)
['ABSTRE', 'ABSTRE', 'ABSFRE', 'ABSKRE']

If want extract all values starting by ABC with length 6 use extract:

t = "^(ABS[0-9a-zA-Z]{3})$"
L = df.filter(like='Name').stack().str.extract(t, expand=False).dropna().tolist()
print (L)

['ABSTRE', 'ABSTRE', 'ABSFRE', 'ABSKRE']

or another @Shaido answer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.