1

I want to analyze SQL query to know which hard coded values are belonging to which column? For e.g I have a following SQL query:-

SELECT *
FROM (
      SELECT DISTINCT id 
              ,substring([data], 0, 497) AS [Instructions]
              ,'500' AS [Care_Code]
              ,cast(id AS VARCHAR) + cast(number AS VARCHAR) + 'pp' AS key
      FROM people
      WHERE ([data] LIKE '%communicated %')
   
     
      UNION ALL
     
      SELECT DISTINCT Patientid
              ,substring(pp, 0, 497) AS [Instructions]
              ,'500' AS [Care_Code]
              ,cast(id AS VARCHAR) + cast(number AS VARCHAR) + 'aa' AS key
      FROM people
      WHERE  Instructions LIKE '%[A-Z]%'

I want the output to be like:-

Harcoded_value    Column_Name
500               Care_Code
%communicated %   data
%[A-Z]%           Instructions

Example 2:-

Query :-

select distinct eid, count(distinct d.pid)   
 from SOAP s      
inner join demographics d on s.pid=d.pid     
inner join PS p on p.providerId=s.pid     
where      
p.npi in ('1316987761','1437366473','1912915638','1740253822')    
and Convert(datetime,Convert(varchar,EncounterDate,101)) >='08/01/2016'    
and  Convert(datetime,Convert(varchar,EncounterDate,101)) <= '07/31/2017'   
group by eid

Expected Output:-

 Harcoded_value                                              Column
  ('1316987761','1437366473','1912915638','1740253822')       p.npi

1 Answer 1

1

You can try

import re
import pandas as pd

s = """SELECT *
FROM (
      SELECT DISTINCT id 
              ,substring([data], 0, 497) AS [Instructions]
              ,'500' AS [Care_Code]
              ,cast(id AS VARCHAR) + cast(number AS VARCHAR) + 'pp' AS key
      FROM people
      WHERE ([data] LIKE '%communicated %')


      UNION ALL

      SELECT DISTINCT Patientid
              ,substring(pp, 0, 497) AS [Instructions]
              ,'500' AS [Care_Code]
              ,cast(id AS VARCHAR) + cast(number AS VARCHAR) + 'aa' AS key
      FROM people
      WHERE  Instructions LIKE '%[A-Z]%'
      and p.npi in ('1316987761','1437366473','1912915638','1740253822')
           """

results = {}
for value in re.findall(r"(([A-Za-z.]+ in )*(((\[.*\]|\w*) LIKE )*\(*'%*.+%*'\)*( AS (\w|\[.*\])*)*))", s):
    splited_values = value[0].split(" ")
    val = "".join(splited_values[2:])
    if "AS" in value[0] and splited_values[2] != "key":
        results[re.sub("\'|\"", "", splited_values[0])] = re.sub(r"\W", "", val)
    elif "LIKE" in value[0] or "in" in value[0]:
        val = val[:-1] if val[-1] == ")" and val[0] != "(" else val
        results[re.sub("\'|\"", "", val)] = re.sub(r"\[|\]", "", splited_values[0])


df = pd.DataFrame(results.items(), columns=["Harcoded_value", "Column_Name"])
print(df)

Output

                                 Harcoded_value       Column_Name
                                            500         Care_Code
                                  communicated%              data
                                        %[A-Z]%       nstructions
   (1316987761,1437366473,1912915638,1740253822)            p.npi

This code will extract all values that contain '' from the query and the words before the LIKE or after the AS and not by the name key and will store it in the results dictionary.
After collecting all the values it will create a DataFrame with the columns "Harcoded_value" and "Column_Name"

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Leo for the amazing answer can I do it in the same way for IN keyword as well !
Thanks for this beautiful solution

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.