1

Any help would be very much appreciated.

I am trying to build a dataframe using data from mongodb.

val spark = SparkSession.builder()
      .master("local")
      .appName("app")
      .config("spark.mongodb.input.uri", uri)
      .config("spark.mongodb.input.collection", "collectionName")
      .config("spark.mongodb.input.readPreference.name", "secondary")
      .getOrCreate()

val df = MongoSpark.load(spark).limit(1)

and from there i'm trying to read elements row by row, and the schema of the dataframe looks something like this:

root
 |-- A: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- B: boolean (nullable = true)
 |-- C: string (nullable = true)
 |-- D: string (nullable = true)
 |-- E: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- a: string (nullable = true)
 |    |    |-- b: string (nullable = true)
 |    |    |-- c: string (nullable = true)
 |    |    |-- d: string (nullable = true)

if the dataframe does not include E, dataframe.show() would print out just fine.

However, if the dataframe does inlcude E, then dataframe.show() would give me

Cannot cast STRING into a StructType(StructField(a,StringType,true), StructField(b,StringType,true), StructField(c,StringType,true), StructField(d,StringType,true)) (value: BsonString{value='http://...some url...'})

I tried pretty much every solution related to this problem listed on stackoverflow, but I'm still having no luck passing this error.

How should I approach this problem? Thank you!

2
  • Can you add a example of mongodb document that contains E ? It seems that E is actually an array of string instead of an array of struct of strings. Commented Jul 27, 2021 at 17:19
  • @VincentDoba I posted the screenshot of the example of mongodb document below. Commented Jul 28, 2021 at 1:36

2 Answers 2

0

Issue occurring becz of data read and data conversion issues with Mongosource type connector.

Me also faced the same situation but you can avoid it by below method :

  1. Infer schema =True
  2. save the schema in another object
  3. again read the data with "saved schema", infer schema= False
  4. Load data and you can do whatever analysis with updated one.
Sign up to request clarification or add additional context in comments.

Comments

0

E is actually an array of objects that contains multiple strings.

example of mongodb document

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.