2

I want to parse a complex SQL which has (inner join,outer join) and get the table names used in the SQL.

I am able to get the table names if it is simple select but if the SQL has inner join ,left join like below then the result is giving only the first table.

select * from xyz  inner join dhf  on df = hfj  where z > 100 

I am using the program similar what is present in the below link by Paul.

http://pyparsing.wikispaces.com/file/view/select_parser.py/158651233/select_parser.py

Can someone tell me how to get all the tables used in a SQL like below

select * from xyz  inner join dhf  on df = hfj  where z > 100.  
2
  • This may be a duplicate of stackoverflow.com/q/35295458/409172 That solution requires a live database and a PL/SQL stored procedure to do most of the work, I'm not sure if that's feasible for you. But that's probably the only way to correctly parse complex SQL. Even non-trivial Oracle SQL is almost impossible to parse. With 2175 keywords, most of them not reserved, parsing Oracle SQL is a huge task. That's why you need a shortcut, like using the EXPLAIN PLAN method in that answer. Commented Aug 30, 2016 at 6:22
  • Pyparsing is no longer hosted on wikispaces.com. Go to github.com/pyparsing/pyparsing Commented Aug 27, 2018 at 12:43

2 Answers 2

1

This parser was written a long time ago, and handling multiple values in a results name did not come along until later.

Change this line in the parser you cited:

single_source = ( (Group(database_name("database") + "." + table_name("table")) | table_name("table")) + 

to

single_source = ( (Group(database_name("database") + "." + table_name("table*")) | table_name("table*")) + 

When I run your sample statement thru the select_stmt parser, I now get this:

select * from xyz  inner join dhf  on df = hfj  where z > 100
['SELECT', ['*'], 'FROM', 'xyz', 'INNER', 'JOIN', 'dhf', 'ON', ['df', '=', 'hfj'], 'WHERE', ['z', '>', '100']]
- columns: ['*']
- table: [['xyz'], ['dhf']]
  [0]:
    ['xyz']
  [1]:
    ['dhf']
- where_expr: ['z', '>', '100']
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Paul for the reply it is exactly working as expected .Iam getting all the table names from the SQL.
Is it possible to get all the join columns also from the SQL query?
Why "table*"? I can sort of tell it allows more than one, but I can't find any docs on it.
The behavior is described in the docs for setResultsName (pythonhosted.org/pyparsing/…) under the listAllMatches argument, and then the '*' is explained in the __call__ docstring (pythonhosted.org/pyparsing/…)
-1

Your question is going to depend on what Sql platform you are using.

I will answer assuming you are using MsSql. The same logic should be able to be done on all Sql platforms thought the syntax changes though.

Tables are unique by a combination of Owner and Table. I do a select that returns #Owner#TableName# in a Python script that I wrote to extract all data in all tables to text files. The basic form of this assuming you do not have multiple tables of the same name with a different owner is:

Select name from SysObjects where xtype = 'U' order by name

This gives you a list of all tables. Then you take that list and do a "Select * from [table name from other query]" looping through till you have all the tables that you found when you selected from Sysobjects.

Same type of thing is practical on all Sql Platforms assuming you have access to the system tables.

2 Comments

Selecting select * from syscolumns can give you the column names.
You misread the question. The OP does not want to query the db for the table names, they want to extract the table names from the posted SQL statement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.