4

I have string data that look like bytes reprs of JSON in Python

>>> data = """b'{"a": 1, "b": 2}\n'"""

So on the inside of that, we have valid JSON that looks like it's been byte-encoded. I want to decode the bytes and loads the JSON on the inside, but since its a string I cannot.

>>> data.decode() # nope
AttributeError: 'str' object has no attribute 'decode'

Encoding the string doesn't seem to help either:

>>> data.encode() # wrong
b'b\'{"a": 1, "b": 2}\n\''

There are oodles of string-to-bytes questions on stackoverflow, but for the life of me I cannot find anything about this particular issue. Does anyone know how this can be accomplished?

Things that I do not want to do and/or will not work:

  1. eval the data into a bytes object
  2. strip the b and \n (inside of my JSON there's all sorts of other escaped data).

This is the only working solution I have found, and there is a lot not to like about it:

from ast import literal_eval

data = """b'{"a": 1, "b": 2}\n'"""
print(literal_eval(data[:-2] + data[-1:]).decode('utf-8'))
9
  • 1
    "I have string data that look like bytes reprs of JSON in Python" - that sounds like a bug you should fix on the producing end. Commented Dec 16, 2020 at 19:14
  • 1
    I wish i could solve it on that end! These are actually airflow logs vis structlog that i must analyze Commented Dec 16, 2020 at 19:18
  • 2
    Anyway, ast.literal_eval. There's probably a good dupe target somewhere around here. Commented Dec 16, 2020 at 19:27
  • 2
    The weird slicing you had to do in your literal_eval attempt is almost certainly due to a bug you introduced while attempting to write a string literal for data - you've got an actual newline in the middle of your bytes literal, which is invalid syntax for a bytes literal. You probably meant for that to be an actual backslash and n - that, or the newline was supposed to be outside the bytes literal. Commented Dec 16, 2020 at 19:39
  • 2
    data = r"""b'{"a": 1, "b": 2}\n'""" is likely more representative of the actual kinds of values you're working with. If it's not, then that's going to be an issue. Commented Dec 16, 2020 at 19:41

1 Answer 1

2

I know you said you didn't want to strip the b inside the string due to other escaped data, but can't we assume that whatever generated this only output ascii (hence the b), and we can re-encode that. So I was thinking you can use a simple regexp (https://regex101.com/r/M0ratk/1) which you then encode as bytes.

import json
import re

match = re.match(r"\Ab'(.*)'\Z", data, re.DOTALL)
data = json.loads(bytes(match[1], 'ascii'))

Will this work? I am not sure how it compares to the literal_eval solution.

Sign up to request clarification or add additional context in comments.

3 Comments

Never in my wildest dreams did i picture a regex solution wow wow
What can I say, when you spend a good portion of every work day writing regex, everything starts looking like a regex problem 🤪
@NolanConaway and now you have two problems. (It's a famous quote, look it up)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.