0

I’m reading documents from DocDB (MongoDB) into Spark using the mongo-spark-connector.

One of the fields, fieldA, is a nested object. If fieldA is missing in a document, I replace it with an empty string ("") in my query. This setup has been working fine, but recently I ran into an issue.

I was reading about 14,000 documents, and only 4 of them had no fieldA. The rest either had a full nested object or a smaller object with just a few fields. Because of this mix, Spark now throws:

com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast STRING into a StructType(StructField(subField,StringType,true)) (value: BsonString{value=''})

Here’s the DocDB query I’m using:

"db_fieldA": {
  $cond: [
    {
      $or: [
        { $eq: [ { $ifNull: ["$fieldA", null] }, null ] },
        { $eq: [ { $size: { $objectToArray: "$fieldA" } }, 0 ] }
      ]
    },
    "",
    "$fieldA"
  ]
}

Examples of fieldA

Full nested object:

{
    "symbol": "ABCD",
    "siteName": "example.com",
    "desc": "PURCHASE DEMO STORE #9999",
    "tags": [
        {
            "type": "subscription_fee",
            "pattern": "monthly",
            "contextEvents": [
                {
                    "amount": 99.99,
                    "date": "2025-01-15"
                }
            ]
        }
    ],
    "lineage": {
        "source": "system_test",
        "version": "1.0",
        "processedBy": "ETL-Dummy-Job",
        "timestamp": "2025-08-08T12:00:00Z"
    }
}

Smaller object (in the case where I'm facing issue):

{
    "subField": "some_value"
}

Problematic case (4 documents):

{}

Is there a way to make Spark always treat fieldA as StringType instead of StructType when it infers the schema?

Environment:

  • Spark 3.1.2
  • Scala 2.12
  • mongo-spark-connector_2.12

1 Answer 1

0

MongoDB collections dont have a schema, but if you want to read it as spark dataframe all rows must have the same schema. So fieldA cant be a String and a json object at the same time. If it not present you shouldnt create an empty string, just drop the field or use null

Sign up to request clarification or add additional context in comments.

1 Comment

I tried imputing null if the value isn't present, ``` "db_fieldA": { $ifNull: ["$fieldA", null]},

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.