12

Using Python to select data from Bigquery public dataset, after getting the result need to print it in JSON format.How to convert the result to JSON ? Thanks!

Have tried row[0] but errors out.

try:
    raw_results = query.rows[0]
    zipped_results = zip(field_names, raw_results)
    results = {x[0]: x[1] for x in zipped_results}
except IndexError:
    results = None

# from google.cloud import bigquery
# client = bigquery.Client()

query = """
    SELECT word, word_count
    FROM `bigquery-public-data.samples.shakespeare`
    WHERE corpus = @corpus
    AND word_count >= @min_word_count
    ORDER BY word_count DESC;
"""
query_params = [
    bigquery.ScalarQueryParameter("corpus", "STRING", "romeoandjuliet"),
    bigquery.ScalarQueryParameter("min_word_count", "INT64", 250),
]
job_config = bigquery.QueryJobConfig()
job_config.query_parameters = query_params
query_job = client.query(
    query,
    # Location must match that of the dataset(s) referenced in the 
    query.location="US",
    job_config=job_config,
)  # API request - starts the query

# Print the results
for row in query_job:
    print("{}: \t{}".format(row.word, row.word_count))
assert query_job.state == "DONE"
0

2 Answers 2

38

There is no current method for automatic conversion, but there is a pretty simple manual method to convert to json:

records = [dict(row) for row in query_job]
json_obj = json.dumps(str(records))

Another option is to convert using pandas:

df = query_job.to_dataframe()
json_obj = df.to_json(orient='records')
Sign up to request clarification or add additional context in comments.

3 Comments

dict(row) - is pretty much the most important part. It transforms the weird tuple into a regular dictionary that can be used for many other things. Thanks for pointing it out.
This does not work in python 3.11 when one of the fields in the query is a datetime (or any other non-serializable field). The following exception is raised: "TypeError: Object of type datetime is not JSON serializable"
@Xist That's not just a python 3.11 issue, it's a limitation of the json parser in Python. To parse datetime objects you will need to create your own serializer. This stack post has some details: stackoverflow.com/questions/11875770/…
7

You can actually just have BigQuery produce JSON directly. Change your query like this:

query = """
SELECT TO_JSON_STRING(word, word_count) AS json
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = @corpus
AND word_count >= @min_word_count
ORDER BY word_count DESC;
"""

Now the result will have a single column named json with JSON-formatted output.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.