0

I have lines of features describing the behavior of English prepositions, for 80,000 lines to process, where I'm trying to characterize, e.g., the parts of speech for the preposition 'across'.

    samp = "across.p.cpa.312(2)c:l:whichc:pos:wdtc:ri:rulefired"
    print(re.search(sep + 'hr:pos:([a-z]+)' + sep, line))
    <re.Match object; span=(6840, 6852), match='\x18hr:pos:nns\x18'>

Note that '\x18' is a separator from the line. There are 1333 such features in a line of length 15942. But, how do I get the match out to a variable that I can then do more analysis. This is easy to do in Perl, but Python seems to make it very difficult.

3
  • "how do I get the match out to a variable" <--- what do you mean by "the match"? Do you mean the match object that contains the matched string, where the match starts and ends and a bunch of other stuff, or do you mean just the matched string? Commented Jul 16, 2020 at 2:18
  • See below. I hope this answers your question, which sounded bad to begin. Commented Jul 16, 2020 at 2:29
  • Does this answer your question? Python extract pattern matches Commented Jul 16, 2020 at 2:31

2 Answers 2

1

search() returns a MatchObject. Use the group() method to get the portion of the string that matched. group(0) returns the entire match, group(1) returns the first group in the regex. You can also use indexing.

m = re.search(sep + 'hr:pos:([a-z]+)' + sep, line)

These return the whole match:

m.group(0)
m[0]

These return the 1st group in the match ('nns' in the example):

m.group[1]
m[1]
Sign up to request clarification or add additional context in comments.

1 Comment

This is a good point, since I didn't meant to put the parentheses in the search. In fact, my intention was eventually to perform further analysis on just the different kinds of parts of speech that follow the preposition. Such as "plural nouns" (nns) or "single proper noun" (nnp). Thanks for this.
0

Okay, I started again. Set m as below, then set pos to the first group.

  m = re.search(sep + 'hr:pos:([a-z]+)' + sep, line)
  pos = m.group(0)
  pos = '\x18hr:pos:nns\x18'

Boy, they don't make it easy to find out how to do this stuff.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.