0

I'm writing a regular expression code to extract items from a SQL statement. I want to pull out a tuple of the items after SELECT * FROM and WHERE statements.

pattern = '''
    SELECT\s\*\sFROM\s ([a-zA-Z\s*\,*\=\.]+) \sWHERE \s*([a-zA-Z\d*\s*\,*\=*\.*]+)'''

From the pattern above, I'm using it for the SQL statement below, and get the following results:

('A, B, C\n           ', 'A.ColA = B.ColB AND B.ColB2=C.ColC')

I know it's because of the huge break before the WHERE, but how do I factor that in and get rid of the huge space and /n after my first tuple?

q = """SELECT * FROM A, B, C
            WHERE A.ColA = B.ColB AND B.ColB2=C.ColC"""

1 Answer 1

1

You can loop over the returned result and use re.sub:

import re
results = ('A, B, C\n           ', 'A.ColA = B.ColB AND B.ColB2=C.ColC')
new_results = tuple(re.sub('\n\s+$', '', i) for i in results)

Output:

('A, B, C', 'A.ColA = B.ColB AND B.ColB2=C.ColC')

Or, you can alter your pattern:

import re
q = """SELECT * FROM A, B, C
        WHERE A.ColA = B.ColB AND B.ColB2=C.ColC"""
new_data = re.findall('(?<=SELECT \* FROM\s)[A-Z\s,]+(?=\n)|(?<=WHERE\s)[\w\W]+', q)

Output:

['A, B, C', 'A.ColA = B.ColB AND B.ColB2=C.ColC']
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Appreciate the added approach of looping it over. And also love the cleanliness of the pattern alternation
@GenXeral glad to help! Please accept the answer if it helped you. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.