1

TLDR: How can I json.loads with custom separator without replacing the separator with a comma?

I have a spark dataframe, that I want to write to CSV, and for that I need to jsonize every row in it.

So I have the following pyspark row:

Row(type='le', v=Row(occ=False, oov=False, v=True), x=966, y=340)

I want to make the row ready for CSV. If I write to CSV with normal json.dumps, I will get line with many commas, then the read csv method doesn't read the file (a lot more commas)

So, I perform json.dumps with separators=("| ", ": ")), and I get the string s:

'["le"| [false| false| true]| 966| 340]'

Now i'm able to do:

json.loads(s.replace('|',','))

And I receive the desired output:

['le', [False, False, True], 966, 340]

Now is the problematic part:

I write it to csv. When I read it, before trying to json.loads, I receive:

'[\\le\\"| [false| false| true]| 966| 340]"'

The desired output, is as before:

['le', [False, False, True], 966, 340]

But I can't reach it.

When I try to do json.loads, I get:

json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)

When I try to change the '|' to ',':

s = s.replace('|',',')
s
Out: '[\\left_ear\\", [false, false, true], 966, 340]"'
json.loads(s)
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)

This post is a try to overcome a previous problem which I didn't find answer to: Convert multiple array of structs columns in pyspark sql

If I find a solution to this problem it will help me.

Bottom line this is the line I need to parse:

'[\\le\\"| [false| false| true]| 966| 340]"'

How can I do it?

8
  • Why are you using separators to generate invalid JSON in the first place? Commented Oct 31, 2019 at 17:30
  • If I write to CSV with normal json.dumps, I will get line with many commas, then the read csv method doesn't read the file (a lot more commas) Commented Oct 31, 2019 at 17:40
  • Does this answer your question? How to save a spark DataFrame as csv on disk? Commented Oct 31, 2019 at 17:43
  • 1
    json.dumps isn't writing CSV at all; at best, your quasi-JSON will pass as valid CSV input, but I wouldn't count on it. I would look into avoiding JSON altogether, and getting a list (or something that the csv module can handle) directly from your data frame. Commented Oct 31, 2019 at 17:53
  • @GSazheniuk sadly no, I don't have write permissions on that server, I can only create CSV and get download it for it. Commented Oct 31, 2019 at 20:01

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.