4

I have a column col1 that represents a GPS coordinate format:

25 4.1866N 55 8.3824E

I would like to split it in multiple columns based on white-space as separator, as in the output example table_example below:

| 1st_split  | 2nd_split   | 3rd_split    | 4th_split    |
|:-----------|------------:|:------------:|:------------:|
|    25      |   4.1866N   |     55       |     8.3824E  |

Considering the fact that there is the split() function, I have tried in this way:

SELECT explode(split(`col1`, ' ')) AS `col` FROM table_example;

But, instead of splitting per multiple columns, it splits per multiple rows, like in the output below:

output

Can someone clarify me which would be the worth approach for get the expected result?

1 Answer 1

4

If you have a dataframe as

+---------------------+
|col                  |
+---------------------+
|25 4.1866N 55 8.3824E|
+---------------------+

Using Scala API

You can simply use split inbuilt function and select appropriately as

import org.apache.spark.sql.functions._
df.withColumn("split", split(col("col"), " "))
    .select(col("split")(0).as("1st_split"), col("split")(1).as("2nd_split"),col("split")(2).as("3rd_split"),col("split")(3).as("4th_split"))
  .show(false)

which would give you

+---------+---------+---------+---------+
|1st_split|2nd_split|3rd_split|4th_split|
+---------+---------+---------+---------+
|25       |4.1866N  |55       |8.3824E  |
+---------+---------+---------+---------+

Using SQL way

Sql is much easier and similar to the api way

df.createOrReplaceTempView("table_example")
val splitted = sqlContext.sql("SELECT split(`col`, ' ') AS `col` FROM table_example")

splitted.createOrReplaceTempView("splitted_table")
val result = sqlContext.sql("SELECT `col`[0] AS `1st_split`, `col`[1] AS `2nd_split`, `col`[2] AS `3rd_split`, `col`[3] AS `4th_split` FROM splitted_table")
result.show(false)

I hope the answer is helpful

Sign up to request clarification or add additional context in comments.

6 Comments

But this is slightly different from a standard SQL approach. I mean... I cannot use that syntax within a SQL context.
On your answer you're using pyspark syntax for dataframe I supposed.
Assuming that you cannot help me to solve the problem on SQL side, do you have an idea of how the same scala snippet would be represented using java, or python syntax?
Thanks for the SQL example! If you would provide me also a sort of java example it would be great. Cause the Scala example is not so useful for my case/scenario. I supposed that they would be quite similar.
This is what I was searching for (just plain SQL within just one query): SELECT col[0] AS 1st_split, col[1] AS 2nd_split, col[2] AS 3rd_split, col[3] AS 4th_split FROM ( SELECT split(col, ' ') AS col FROM example_table )... but you suggested me the right way with your SQL example :-)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.