5

I want to extract rows from column feccandid that have a H or S as the first value:

    cid     amount  date    catcode     feccandid
0   N00031317   1000    2010    B2000   H0FL19080
1   N00027464   5000    2009    B1000   H6IA01098
2   N00024875   1000    2009    A5200   S2IL08088
3   N00030957   2000    2010    J2200   S0TN04195
4   N00026591   1000    2009    F3300   S4KY06072
5   N00031317   1000    2010    B2000   P0FL19080
6   N00027464   5000    2009    B1000   P6IA01098
7   N00024875   1000    2009    A5200   S2IL08088
8   N00030957   2000    2010    J2200   H0TN04195
9   N00026591   1000    2009    F3300   H4KY06072

I am using this code:

campaign_contributions.loc[campaign_contributions['feccandid'].astype(str).str.extractall(r'^(?:S|H)')]

Returns error: ValueError: pattern contains no capture groups

Does anyone with experience using Regex know what I am doing wrong?

2 Answers 2

5

Why not just use str.match instead of extract and negate?

ie df[df['col'].str.match(r'^(S|H)')]

(I came here looking for the same answer, but the use of extract seemed odd, so I found the docs for str.ops.

W

Sign up to request clarification or add additional context in comments.

1 Comment

While both answers are functional, this is a much nicer solution.
2

For something this simple, you can bypass the regex:

relevant = campaign_contributions.feccandid.str.startswith('H') | \
    campaign_contributions.feccandid.str.startswith('S')
campaign_contributions[relevant]

However, if you want to use a regex, you can change this to

relevant = ~campaign_contributions['feccandid'].str.extract(r'^(S|H)').isnull()

Note that the astype is redundant, and that extract is enough.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.