0

I am trying to extract the second match to "LOCATION \s+\S+" from the following text:

 PAGE    1
​
                BID OPENING DATE    07/25/18    FROM 0.2 MILES WEST OF ICE HOUSE        07/26/18 CONTRACT NUMBER    03-2F1304   ROAD TO 0.015 MILES WEST OF CONTRACT CODE 'A '
​
            LOCATION    03-ED-50-39.5/48.7  DIVISION HIGHWAY ROAD   44 CONTRACT ITEMS
​
        INSTALL SANDTRAPS AND PULLOUTS  FEDERAL AID ACNH-P050-(146)E
​
PAGE    1
​
                    BID OPENING DATE    07/25/18    IN EL DORADO COUNTY AT VARIOUS          07/26/18 CONTRACT NUMBER     03-2H6804  LOCATIONS ALONG ROUTES 49 AND 193   CONTRACT CODE 'C ' LOCATION 03-ED-0999-VAR          13 CONTRACT ITEMS
​
​
​
        TREE REMOVAL    FEDERAL AID NONE
​
PAGE    1
​
                BID OPENING DATE    07/25/18    IN LOS ANGELES, INGLEWOOD AND       07/26/18 CONTRACT NUMBER    07-296304   CULVER CITY, FROM I-105 TO PORT CONTRACT CODE 'B '
​
            LOCATION    07-LA-405-R21.5/26.3    ROAD UNDERCROSSING  55 CONTRACT ITEMS
​
​
​
        ROADWAY SAFETY IMPROVEMENT  FEDERAL AID ACIM-405-3(056)E

I am trying to get LOCATION 03-ED-0999-VAR (second match) from the text. Is there a way to specify that we want the second or the third or the nth match in python? Right now, I have the following code:

# imports
import os
import pandas as pd
import re
import docx2txt
import textract
import antiword

text = ' PAGE    1

                BID OPENING DATE    07/25/18    FROM 0.2 MILES WEST OF ICE HOUSE        07/26/18 CONTRACT NUMBER    03-2F1304   ROAD TO 0.015 MILES WEST OF CONTRACT CODE 'A '

            LOCATION    03-ED-50-39.5/48.7  DIVISION HIGHWAY ROAD   44 CONTRACT ITEMS

        INSTALL SANDTRAPS AND PULLOUTS  FEDERAL AID ACNH-P050-(146)E

PAGE    1

                    BID OPENING DATE    07/25/18    IN EL DORADO COUNTY AT VARIOUS          07/26/18 CONTRACT NUMBER     03-2H6804  LOCATIONS ALONG ROUTES 49 AND 193   CONTRACT CODE 'C ' LOCATION 03-ED-0999-VAR          13 CONTRACT ITEMS



        TREE REMOVAL    FEDERAL AID NONE

PAGE    1

                BID OPENING DATE    07/25/18    IN LOS ANGELES, INGLEWOOD AND       07/26/18 CONTRACT NUMBER    07-296304   CULVER CITY, FROM I-105 TO PORT CONTRACT CODE 'B '

            LOCATION    07-LA-405-R21.5/26.3    ROAD UNDERCROSSING  55 CONTRACT ITEMS



        ROADWAY SAFETY IMPROVEMENT  FEDERAL AID ACIM-405-3(056)E'

location1 = re.search('LOCATION \s+\S+', text)

1 Answer 1

2

Instead of using re.search() you could try using re.findall() instead. This will get you all the matches in form of a list and you could pick whichever you'd like and even count how many you got.

location1 = re.findall("LOCATION \s+\S+", text)
print(len(location1)) # To print how many matches there are
print(location1[1]) # To print second match
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.