3

I have a dataframe in Pandas with 729278 rows and 190 columns:

df1:

+----------+----------+----------+---+---+-----+---------+
| RULE_1_2 | RULE_2_2 | RULE_3_2 | … | … | smt | default |
+----------+----------+----------+---+---+-----+---------+
| 0        | 0        | 0        | … | … | 2   | 0       |
| 0        | 2        | 3        | … | … | 3   | 0       |
| 1        | 3        | 0        | … | … | 4   | 1       |
| …        | …        | …        | … | … | …   | …       |
+----------+----------+----------+---+---+-----+---------+

Trying to exctract all columns containing RULE and column 'default'.

Code:

df2 = df1[df1.filter(regex='RULE'), df1["default"]]

But Python says:

[729278 rows x 1 columns])' is an invalid key

All columns contain int64 type, which confirmed by df1.dtypes

What's wrong with 1 column 'default'? It doesn't appear in datamrame 'df2'. How to fix it?

1 Answer 1

3

Idea is add another part of regex joined by | for regex or, also ^ is for start of string and $ for end of string for prevent selecting strings like some data default:

df2 = df1.filter(regex='RULE|^default$')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.