Setting a variable to a matched regex in Python

Question

I have lines of features describing the behavior of English prepositions, for 80,000 lines to process, where I'm trying to characterize, e.g., the parts of speech for the preposition 'across'.

    samp = "across.p.cpa.312(2)c:l:whichc:pos:wdtc:ri:rulefired"
    print(re.search(sep + 'hr:pos:([a-z]+)' + sep, line))
    <re.Match object; span=(6840, 6852), match='\x18hr:pos:nns\x18'>

Note that '\x18' is a separator from the line. There are 1333 such features in a line of length 15942. But, how do I get the match out to a variable that I can then do more analysis. This is easy to do in Perl, but Python seems to make it very difficult.

"how do I get the match out to a variable" <--- what do you mean by "the match"? Do you mean the match object that contains the matched string, where the match starts and ends and a bunch of other stuff, or do you mean just the matched string? — Sweeper
– Sweeper, Commented Jul 16, 2020 at 2:18
See below. I hope this answers your question, which sounded bad to begin. — KenLit
– KenLit, Commented Jul 16, 2020 at 2:29
Does this answer your question? Python extract pattern matches — Sweeper
– Sweeper, Commented Jul 16, 2020 at 2:31

RootTwo · Accepted Answer · 2020-07-16 02:50:06Z

1

search() returns a MatchObject. Use the group() method to get the portion of the string that matched. group(0) returns the entire match, group(1) returns the first group in the regex. You can also use indexing.

m = re.search(sep + 'hr:pos:([a-z]+)' + sep, line)

These return the whole match:

m.group(0)
m[0]

These return the 1st group in the match ('nns' in the example):

m.group[1]
m[1]

answered Jul 16, 2020 at 2:50

RootTwo

4,4361 gold badge13 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

KenLit Over a year ago

This is a good point, since I didn't meant to put the parentheses in the search. In fact, my intention was eventually to perform further analysis on just the different kinds of parts of speech that follow the preposition. Such as "plural nouns" (nns) or "single proper noun" (nnp). Thanks for this.

KenLit · Accepted Answer · 2020-07-16 02:27:30Z

0

Okay, I started again. Set m as below, then set pos to the first group.

  m = re.search(sep + 'hr:pos:([a-z]+)' + sep, line)
  pos = m.group(0)
  pos = '\x18hr:pos:nns\x18'

Boy, they don't make it easy to find out how to do this stuff.

answered Jul 16, 2020 at 2:27

KenLit

337 bronze badges

Collectives™ on Stack Overflow

Setting a variable to a matched regex in Python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related