Pandas: select rows from columns using Regex

Question

I want to extract rows from column feccandid that have a H or S as the first value:

    cid     amount  date    catcode     feccandid
0   N00031317   1000    2010    B2000   H0FL19080
1   N00027464   5000    2009    B1000   H6IA01098
2   N00024875   1000    2009    A5200   S2IL08088
3   N00030957   2000    2010    J2200   S0TN04195
4   N00026591   1000    2009    F3300   S4KY06072
5   N00031317   1000    2010    B2000   P0FL19080
6   N00027464   5000    2009    B1000   P6IA01098
7   N00024875   1000    2009    A5200   S2IL08088
8   N00030957   2000    2010    J2200   H0TN04195
9   N00026591   1000    2009    F3300   H4KY06072

I am using this code:

campaign_contributions.loc[campaign_contributions['feccandid'].astype(str).str.extractall(r'^(?:S|H)')]

Returns error: ValueError: pattern contains no capture groups

Does anyone with experience using Regex know what I am doing wrong?

sbha · Accepted Answer · 2018-11-18 04:07:02Z

5

Why not just use str.match instead of extract and negate?

ie df[df['col'].str.match(r'^(S|H)')]

(I came here looking for the same answer, but the use of extract seemed odd, so I found the docs for str.ops.

W

edited Nov 18, 2018 at 4:07

sbha

10.5k2 gold badges78 silver badges64 bronze badges

answered Mar 20, 2018 at 17:51

W D

1911 silver badge3 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

craymichael Over a year ago

While both answers are functional, this is a much nicer solution.

Ami Tavory · Accepted Answer · 2016-07-29 16:19:16Z

2

For something this simple, you can bypass the regex:

relevant = campaign_contributions.feccandid.str.startswith('H') | \
    campaign_contributions.feccandid.str.startswith('S')
campaign_contributions[relevant]

However, if you want to use a regex, you can change this to

relevant = ~campaign_contributions['feccandid'].str.extract(r'^(S|H)').isnull()

Note that the astype is redundant, and that extract is enough.

answered Jul 29, 2016 at 16:19

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Collectives™ on Stack Overflow

Pandas: select rows from columns using Regex

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related