0

I am trying to generate a SQL statement with python. Please check the script below:

import re

json_file_object = open("sample_json_paths.txt", "r")
sql = list()

for sample_text in json_file_object:

# sample_text = "$.testABM.test.test.test.test.test.testReference"

    sql.append("SELECT\n")
    sql.append("            CONVERT(NVARCHAR(32), HashBytes('MD5', concat(rt.id,'_', @now)), 2) AS docid\n")


    #Append parentnodeid row
    remove_dollar_sign = sample_text.replace("$.","")
    json_string_list = remove_dollar_sign.split(".")
    nodeid = json_string_list.pop(-1)
    parent_node_id = ".".join(json_string_list)
    sql.append("            ,'" + parent_node_id + "' " + "AS parentnodeid\n")

    #Append nodeid row
    sql.append("            ,'" + nodeid + "' " + "AS nodeid\n")

    sql.append("            ,testABM_layer.RequestHeader AS RequestHeader\n")
    sql.append("            ,final_layer.[key] AS final_layer_key\n")
    sql.append("            ,final_layer.[value] AS Ofinal_layer_value\n")
    sql.append("FROM        dbo.test_dataset_backup rt")
    sql.append("""
                OUTER APPLY OPENJSON ( rt.test) AS layer_root
                OUTER APPLY OPENJSON ( layer_root.value ) 
                    WITH    (
                                    [RequestHeader] NVARCHAR(MAX) AS json,
                                    [test] NVARCHAR(MAX) AS json
                            ) AS testABM_layer\n""")

    #append OUTER APPLY with json sub path
    remove_leading_json = sample_text.replace(".testABM.test","")
    json_string_list = remove_leading_json.split(".")
    json_string_list.pop(-1)
    json_sub_path = ".".join(json_string_list)
    sql.append("            OUTER APPLY OPENJSON ( testABM_layer.test, " + "'" + json_sub_path + "'" + ") AS final_layer\n")
    sql.append("            WHERE final_layer.[key] = '" + nodeid + "'\n")

    sql.append("UNION ALL\n")

# print(json_sub_path)

sql_output = "".join(sql)
f = open("sql_statements.txt", "a")
f.write(sql_output)
f.close()

print(sql_output)

The line break occurs at Delete due to sensitive informations and Delete due to sensitive informations - before the UNION ALL The issue does not occur at the last loop. You can also check the sample text from the input file as below:

SELECT
            CONVERT(NVARCHAR(32), HashBytes('MD5', concat(rt.id,'_', @now)), 2) AS docid
            ,'testABM.test.test.test' AS parentnodeid
            ,'testReference
' AS nodeid
            ,testABM_layer.RequestHeader AS RequestHeader
            ,final_layer.[key] AS final_layer_key
            ,final_layer.[value] AS Ofinal_layer_value
FROM        dbo.test_dataset_backup rt
                OUTER APPLY OPENJSON ( rt.test) AS layer_root
                OUTER APPLY OPENJSON ( layer_root.value ) 
                    WITH    (
                                    [RequestHeader] NVARCHAR(MAX) AS json,
                                    [test] NVARCHAR(MAX) AS json
                            ) AS testABM_layer
            OUTER APPLY OPENJSON ( testABM_layer.test, '$.test.test') AS final_layer
            WHERE final_layer.[key] = 'testReference
'
UNION ALL

How can I fix this issue ?

Thanks

2
  • 2
    for whichever database(s) you're working with there should already be a tested library which would handle parameter substitution in an anti-SQL-injection way. is there a particular reason you're doing this without utilizing the DB-API way? it may seem that i'm asking a tangential question, but if you use one of those libraries there's a chance you wouldn't have this issue you're seeing. Commented Jan 12, 2022 at 16:15
  • @mechanical_meat, I am using Synapse DW in Azure. The purpose is to Insert the data from json nodes into Synapse tables. So this is the current solution which I can think of for now Commented Jan 12, 2022 at 16:28

1 Answer 1

1

When iterating over the file contents sample_text contains newline characters at the end, e.g. '$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.OriginalSystemReference\n'

The issue is that the nodeid by splitting the line and taking the last element of the split.

You can fix the problem by stripping the sample_text at the beginning of each iteration:

sample_text = sample_text.strip()

The reason why it worked for the last line is that it does not contain the newline character in your file.


This will help you with your existing code but I also strongly suggest looking into better ways of generating these string:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.