1

I'm trying to parse a SQL code and extract all the table names from it.

The table names in the sql code are sometimes written with alias or schema.table or only table name.

I'm using sqlparse package but I'm only getting the alias of the tables, you will find my code bellow :

SELECT  *
FROM VA_ASSISTES va, datamart_Sales.dbo.Seller f,
datamart_Sales.dbo.ARTICLES a, SITE s, datamart_Sales.dbo.TEMPS t

This is what I get now as a result, only the alias :

Tables: va, f, a, s, t

However, I want to retrieve the table name like this : datamart_Sales.dbo.Seller , datamart_Sales.dbo.ARTICLES, datamart_Sales.dbo.TEMPS, SITE

I will really appreciate it if someone can help me to extract the table name in all the cases mentions above.

1 Answer 1

1

sqlparse does not specifically label identifiers as belonging to column names/aliases, or table names/aliases. Thus, you will have to loop over the parsed tokens and mark when a from keyword occurs, and then retain subsequent identifiers:

import sqlparse
s = """SELECT  *
FROM VA_ASSISTES va, datamart_Sales.dbo.Seller f,
datamart_Sales.dbo.ARTICLES a, SITE s, datamart_Sales.dbo.TEMPS t"""
s1="""SELECT job_id,AVG(salary) FROM VA_ASSISTES va, datamart_Sales.dbo.Seller f, datamart_Sales.dbo.ARTICLES a, SITE s, datamart_Sales.dbo.TEMPS t Havning job_id,AVG(salary)< (SELECT MAX(AVG(min_salary)) FROM jobs WHERE job_id IN (SELECT job_id FROM job_history WHERE department_id BETWEEN 50 AND 100) GROUP BY job_id));"""
def get_tables(p):
   f = False
   for i in p:
      if i.value.lower() == 'from':
         f = True
      if f and isinstance(i, (sqlparse.sql.Identifier, sqlparse.sql.IdentifierList)):
         if isinstance(i, sqlparse.sql.IdentifierList):
            yield from [j.value.split() for j in i.get_identifiers()]
         else:
             yield i.value.split()
         f = False
      yield from get_tables(getattr(i, '__iter__', lambda :[])())

print(list(get_tables(sqlparse.parse(s)[0])))
print(list(get_tables(sqlparse.parse(s1)[0])))

Output:

[['VA_ASSISTES', 'va'], ['datamart_Sales.dbo.Seller', 'f'], ['datamart_Sales.dbo.ARTICLES', 'a'], ['SITE', 's'], ['datamart_Sales.dbo.TEMPS', 't']]
[['VA_ASSISTES', 'va'], ['datamart_Sales.dbo.Seller', 'f'], ['datamart_Sales.dbo.ARTICLES', 'a'], ['SITE', 's'], ['datamart_Sales.dbo.TEMPS', 't'], ['jobs'], ['job_history']]
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the prompte reply! How about if I have nested queries like this : SELECT job_id,AVG(salary) FROM VA_ASSISTES va, datamart_Sales.dbo.Seller f, datamart_Sales.dbo.ARTICLES a, SITE s, datamart_Sales.dbo.TEMPS t Havning job_id,AVG(salary)< SELECT MAX(AVG(min_salary)) FROM jobs WHERE job_id IN (SELECT job_id FROM job_history WHERE department_id BETWEEN 50 AND 100) GROUP BY job_id);
@Abdelhak You will need to use recursion. Please see my recent edit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.