0

I have a dataframe which contains only one row with the column name: source_column in the below format:

forecast_id:bigInt|period:numeric|name:char(50)|location:char(50)

I want to retrieve this value into a String and then split it on the regex | First I tried converting the row from the DataFrame into the String by following way so that I can check if the row is converted to String:

val sourceColDataTypes = sourceCols.select("source_columns").rdd.map(x => x.toString()).collect()

When I try to print: println(sourceColDataTypes) to check the content, I see [Ljava.lang.String;@19bbb216 I couldn't understand the mistake here. Could anyone let me know how can I properly fetch a row from a dataframe and convert it to String.

1 Answer 1

2

You can also try this:

df.show()

//Input data
//+-----------+----------+--------+--------+
//|forecast_id|period    |name    |location|
//+-----------+----------+--------+--------+
//|1000       |period1000|name1000|loc1000 |
//+-----------+----------+--------+--------+

df.map(_.mkString(",")).show(false)

//Output:
//+--------------------------------+
//|value                           |
//+--------------------------------+
//|1000,period1000,name1000,loc1000|
//+--------------------------------+        

df.rdd.map(_.mkString(",")).collect.foreach(println)

//1000,period1000,name1000,loc1000
Sign up to request clarification or add additional context in comments.

6 Comments

I am passing this expression to a val and printing as: val sourceColDataTypes = sourceCols.rdd.map(_.mkString(",")).collect.foreach(println) println("source columns: ") println(sourceColDataTypes) O/P comes out as: source columns: () I see that the foreach giving the actual output but the val: sourceColDataTypes doesn't result in anything.
1. I am getting a column value from a RDBMS table which has column details to create a Hive table: val sourceCols = spark.read.format("jdbc").option("url", hiveMetaConURL) .option("dbtable", "(select source_columns from base.coltables where tablename='base.forecast') as sCols") .option("user", metaUserName) .option("password", metaPassword) .load() 2. The data is separated by "|". I want to convert this a String, replace "|" with comma and created a DDL using the same String. For that, Im trying to convert the value present on the DF to a String.
If you have assigned this sourceCols.rdd.map(_.mkString(",")) to sourceColDataTypes then it wouldn't work as it's an RDD. You need to use collect to change it to a collection and use foreach to print it out.
I have used it with collect. I have given what I tried in the first comment.
This is the output: source columns: [Ljava.lang.String;@236206f8
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.