1

I'm trying without sucess to pass a Json string to a Python Script using PowerShell Script (.ps1) to automate this task.

spark-submit `
--driver-memory 8g `
--master local[*] `
--conf spark.driver.bindAddress=127.0.0.1 `
--packages mysql:mysql-connector-java:6.0.6,org.elasticsearch:elasticsearch-spark-20_2.11:7.0.0 `
--py-files build/dependencies.zip build/main.py `
$param

When $param='{ \"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test\"\"}' works fine, the python receives a valid JSON string and parse correctly.

When I use the character & like $param='{ \"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&autoReconnect=true&useSSL=false\"\"}' the string is printed like { "job_start": \jdbc:mysql://127.0.0.1:3307/test? and the rest of the string are reconized as other commands.

'serverTimezone' is not recognized as an internal or external command
'autoReconnect' is not recognized as an internal or external command
'useSSL' is not recognized as an internal or external command

The \"\" is need to maintain the double quots in the Python script, not sure why need two escaped double quotes.

UPDATE:

Now I'm having problems with the ! character, I can't escape this character even with ^ or \.

# Only "" doesn't work
$param='{\"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test^&serverTimezone=UTC\"\", \"\"password\"\": \"\"testpassword^!123\"\"}'

spark-submit.cmd `
--driver-memory 8g `
--master local[*] `
--conf spark.driver.bindAddress=127.0.0.1 `
--packages mysql:mysql-connector-java:6.0.6,org.elasticsearch:elasticsearch-spark-20_2.11:7.0.0 `
--py-files build/dependencies.zip build/main.py `
$param

# OUTPUT: misses the ! character
{"job_start": "jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC", "password": "testpassword123"}

Thanks you all.

3
  • 1
    I wonder if the spark-submit has another context the a simple python script? Commented May 6, 2020 at 12:34
  • It would be good to understand whether there truly is a problem with how spark-submit relays arguments to Python, or whether the problem is unique to your scenario / environment. Also, in your update you refer to output: who produces that output? Commented May 9, 2020 at 14:03
  • @mklement0 yes I'm trying to figure out what happen with ````spark-submit```, because passing directly to Python is working with yours explanations. About the output is a print function in the begining of my spark script Commented May 10, 2020 at 20:09

2 Answers 2

2

tl;dr

Note: The following does not solve the OP's specific problem (the cause of which is still unknown), but hopefully contains information of general interest.

# Use "" to escape " and - in case of delayed expansion - ^! to escape !
$param = '{ ""job_start"": ""jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more^!"" }'
  • There are high-profile utilities (CLIs) such as az (Azure) that are Python-based, but on Windows use an auxiliary batch file as the executable that simply relays arguments to a Python script.
    • Use Get-Command az, for instance, to discover an executable's full file name; batch files, which are processed by cmd.exe, the legacy command processor, have a filename extension of either .cmd or .bat
  • To prevent calls to such a batch file from breaking, double quotes embedded in arguments passed from PowerShell must be escaped as ""
  • Additionally, but only if setlocal enabledelayedexpansion is in effect in a given target batch file or if your computer is configured to use delayed expansion by default, for all batch files:
    • ! characters must be escaped as ^!, which, however, is only effective if cmd.exe considers the ! part of a double-quoted string.

It looks like we have a confluence of two problems:

  • A PowerShell problem with " chars. embedded in arguments passed to external programs:

    • In an ideal world, passing JSON text such as '{ "foo": "bar" }' to an external program would work as-is, but due to PowerShell's broken handling of embedded double quotes, that is not enough, and the " chars. must additionally be escaped, for the target program, either as \" (which most programs support), or, in the case of cmd.exe (see below), as "", which Python fortunately recognizes too: '{ ""foo"": ""bar"" }'
  • Limitations of argument-passing and escaping in cmd.exe batch files:

    • It sounds like spark-submit is an auxiliary batch file (.cmd or .bat) that passes the arguments through to a Python script.

    • The problem is that if you use \" for escaping embedded ", cmd.exe doesn't recognize them as escaped, which causes it to consider the & characters unquoted, and they are therefore interpreted as shell metacharacters, i.e. as characters with special syntactic function (command sequencing, in this case).

    • Additionally, and only if setlocal enabledelayedexpansion is in effect in a given batch file, any literal ! characters in arguments require additional handling:

      • If cmd.exe thinks the ! is part of an unquoted argument, you cannot escape ! at all.

      • Inside a quoted argument (which invariably means "..." in cmd.exe), you must escape a literal ! as ^!.

        • Note that this requirement is the inverse of how all other metacharacters must be escaped (which require ^ when unquoted, but not inside "...").

        • The unfortunate consequence is that you need to know the implementation details of the target batch file - whether it uses setlocal enabledelayedexpansion or not - in order to formulate your arguments properly.

        • The same applies if your computer is configured to use delayed expansion by default, for all batch files (and interactively), which is neither common nor advisable. To test if a given computer is configured that way, check the output from the following command for DelayedExpansion : 1: if there's no output at all, delayed expansion is OFF; if there's 1 or 2 outputs, delayed expansion is ON by default if the first or only output reports DelayedExpansion : 1.

 Get-ItemProperty -EA Ignore 'registry::HKEY_CURRENT_USER\Software\Microsoft\Command Processor', 'registry::HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor' DelayedExpansion

Workaround:

  • Since you're technically calling a batch file, use "" to escape literal " chars. inside your single-quoted ('...') PowerShell string.

  • If you know that the target batch file uses setlocal enabledelayedexpansion or if your computer is configured to use delayed expansion by default, escape ! characters as ^!

    • Note that this is only effective if cmd.exe considers the ! part of a double-quoted string.

Therefore (note that I've extended the URL to include a token with !, meant to be passed through literally as suffix more!):

$param = '{ ""job_start"": ""jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more^!"" }'

If you need to escape an existing JSON string programmatically:

# Unescaped JSON string, which in an ideal world you'd be able
# to pass as-is.
$param = '{ "job_start": "jdbc:mysql://127.0.0.1:3307/test&serverTimezone=UTC&more!" }'

# Escape the " chars.
$param = $param -replace '"', '""'

# If needed, also escape the ! chars.
$param = $param -replace '!', '^!'

Ultimately, both problems should be fixed at the source - but that this is highly unlikely, because it would break backward compatibility.

With respect to PowerShell, this GitHub issue contains the backstory, technical details, a robust wrapper function to hide the problems, and discussions about how to fix the problem at least on an opt-in basis.

Sign up to request clarification or add additional context in comments.

1 Comment

@BrunoBernardes. (As it turns out, the twice-escaping would not have worked anyway with respect to !) There must be something else going on, and you need to determine the chain of calls, and where, specifically, the error occurs. The only thing that is certain is that it is cmd.exe that is complaining, i.e. that the error occurs in a batch file.
0

In this question Which characters need to be escaped when using Bash? , you will find all the characters that you should escape when passing them as normal characters in the shell, you will also notice that & is one of them.

Now I understand that if you tried to escape it, the JSON parser you are using will probably fail to parse the string. So one quick workaround would be to replace the & by any other special non-escapable symbol like @ or %, and do a step in your app where you replace it with & before parsing. Just make sure that the symbol you will use isn't used in your strings, and won't be used at any time.

3 Comments

thanks for the reply, but I send strings in this JSON that contains others special characters and I will needed to figure out replacement for every one of them
I manage to escape the & character using ^, so the final test string was $param='{\"\"job_start\"\": \"\"jdbc:mysql://127.0.0.1:3307/test^&serverTimezone=UTC\"\"}', which I think is not that great, but pacience
Unfortunately, the question was originally mistagged as bash, which isn't involved, judging by the error messages. Instead, it is cmd.exe via an auxiliary batch file that relays arguments to a Python script. So, unfortunately, there are two shells involved: PowerShell, which has no problem with the &, given that it's inside a quoted string, and cmd.exe, which is where the problem manifests. Two general asides: If something is correctly escaped, the escape character is removed during parsing; also, spark-submit is a third-party tool, so modifying it isn't really an option.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.