This is the last link to completing a majorly important data pipeline. We have the following newline delimited JSON, that we've exported from BigQuery into GCS, and then have downloaded locally:
{"name":"Terripins","fga":"42","fgm":"28","fgPct":0.67}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":true,"name":"","fga":"0","fgm":"0"}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":true,"name":"Crusaders","fga":"54","fgm":"33","fgPct":0.61}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":false,"name":"Greyhounds","fga":"54","fgm":"33","fgPct":0.61}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":false,"name":"Greyhounds","fga":"68","fgm":"20","fgPct":0.29}
{"gameTime":"2019-01-12 12:00:00 UTC","gameDate":"2019-01-12","updated":"2019-01-12 20:25:03 UTC","isHome":true,"name":"Crusaders","fga":"68","fgm":"20","fgPct":0.29}
We mongoimport this into our mongodb cluster, and the collection is successfully created:
Unfortunately, when we export the JSON from BigQuery, the integer types are converted into strings (see fga, fgm), and the date columns are also converted into strings. This image shows the original schema from BigQuery.
We are trying to use the python mongodb client library pymongo to convert fga, and fgm into integer types. Presumably it is easier to (a) load the "stringified" json file into mongodb, and then use pymongo to update types, rather than (b) update or fix the types directly in the JSON file before mongoimporting into mongo. So we are trying (a).
import(pymongo)
... connect to db and set "db"
our_collection = db["our_coll_name"]
# query and set for "update"
myquery = {} # for whole table
newvalues = { "$set": { "fga": int(fga) } } # change to int
# and update
new_output = our_collection.update_many(myquery, newvalues)
print(new_output.modified_count, "documents updated."
This doesn't work because int(fga) returns an error name 'fga' is not defined, and if we instead run int("fga"), we get the error ValueError: invalid literal for int() with base 10: 'fga'.
These errors both make complete sense to us, but we're still unsure then of how to update fga and fgm in this example to int. Also, are there mongo-specific date and timestamp types we can use for the 3 fields [gameTime, gameDate, updated], and how can we make these conversions as well using pymongo?

