In PySpark, how do I convert a Dataframe to normal String?
Background:
I'm using PySpark with Kafka and instead of hard coding broker name, I have parameterized Kafka broker name in PySpark.
Json file is holding the Broker details and Spark read this Json input and assign values to variable. These variables are of Dataframe type with String.
I'm facing issue when I pass dataframe to Pyspark-Kakfa connection details to substitute the values.
Error :
Can only concatenate String (Not a Dataframe) to String.
Json parameter file :
{
"broker": "https://at.com:8082",
"topicname": "dev_hello"
}
PySpark Code :
parameter = spark.read.option("multiline", "true").json("/at/dev_parameter.json")
kserver = parameter.select("broker")
ktopic = parameter.select("topicname")
df.selectExpr("CAST(id AS STRING) AS key", "to_json(struct(*)) AS value")
.write
.format("kafka")
.outputMode("append")
.option("kafka.bootstrap.servers", "f"+ **kserver**)
.option("topic", "josn_data_topic",**ktopic** )
.save()
Please advise on it.
my second query is how do I pass these Python based variables to another Scala based Spark notebook.