Apache Spark Error: Cannot cast STRING into a StructType(StructField(subFieldA,StringType,true)) (value: BsonString{value='{}'})

Question

I’m reading documents from DocDB (MongoDB) into Spark using the mongo-spark-connector.

One of the fields, fieldA, is a nested object. If fieldA is missing in a document, I replace it with an empty string ("") in my query. This setup has been working fine, but recently I ran into an issue.

I was reading about 14,000 documents, and only 4 of them had no fieldA. The rest either had a full nested object or a smaller object with just a few fields. Because of this mix, Spark now throws:

com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast STRING into a StructType(StructField(subField,StringType,true)) (value: BsonString{value=''})

Here’s the DocDB query I’m using:

"db_fieldA": {
  $cond: [
    {
      $or: [
        { $eq: [ { $ifNull: ["$fieldA", null] }, null ] },
        { $eq: [ { $size: { $objectToArray: "$fieldA" } }, 0 ] }
      ]
    },
    "",
    "$fieldA"
  ]
}

Examples of fieldA

Full nested object:

{
    "symbol": "ABCD",
    "siteName": "example.com",
    "desc": "PURCHASE DEMO STORE #9999",
    "tags": [
        {
            "type": "subscription_fee",
            "pattern": "monthly",
            "contextEvents": [
                {
                    "amount": 99.99,
                    "date": "2025-01-15"
                }
            ]
        }
    ],
    "lineage": {
        "source": "system_test",
        "version": "1.0",
        "processedBy": "ETL-Dummy-Job",
        "timestamp": "2025-08-08T12:00:00Z"
    }
}

Smaller object (in the case where I'm facing issue):

{
    "subField": "some_value"
}

Problematic case (4 documents):

{}

Is there a way to make Spark always treat fieldA as StringType instead of StructType when it infers the schema?

Environment:

Spark 3.1.2
Scala 2.12
mongo-spark-connector_2.12

Mikel San Vicente · Accepted Answer · 2025-08-09 12:59:53Z

0

MongoDB collections dont have a schema, but if you want to read it as spark dataframe all rows must have the same schema. So fieldA cant be a String and a json object at the same time. If it not present you shouldnt create an empty string, just drop the field or use null

answered Aug 9 at 12:59

Mikel San Vicente

3,8632 gold badges24 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

fancybear Aug 11 at 6:23

I tried imputing null if the value isn't present, ``` "db_fieldA": { $ifNull: ["$fieldA", null]},

Collectives™ on Stack Overflow

Apache Spark Error: Cannot cast STRING into a StructType(StructField(subFieldA,StringType,true)) (value: BsonString{value='{}'})

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related