0

Attempting to split a string column into 4 columns: A, B, C, D on Databricks using python.

# Load CSV file
df = spark.read.format("csv").options(header='false').load("file path")

# Rename header
RawDataDF = RawDataDF.withColumnRenamed("_c0","raw")

#Attempt to split "raw" into 4 columns:
splitDF = RawDataDF.withColumn("split_raw_arr", split("raw", " "))
uDataDF= uDataDF.withColumn('Column A', splitDF.getItem(0))
uDataDF= uDataDF.withColumn('Column B', splitDF.getItem(1))
uDataDF= uDataDF.withColumn('Column C', splitDF.getItem(2))
uDataDF= uDataDF.withColumn('Column D', splitDF.getItem(3))

Error message:

AttributeError: 'DataFrame' object has no attribute 'getItem'

Any advice is appreciated.

1
  • 1
    how about splitDF[0] ? Commented Aug 1, 2021 at 21:34

1 Answer 1

2

The use of split to create individual columns is correct.

However you cannot directly use getItem on a dataframe (splitDF) , The error you are getting signifies that.

Also you might have missed out the initialization step of uDataDF in the question and you are creating a column value based out of spiltDF , which is also not possible without a join.

withColumn wont allow this, as it takes Column type as the second argument.

You can directly use splitDF to create the columns and further select the ones to keep to create a new dataframe - uDataDF

Typical Example - Split

input_list = [
  (1,"7 6 10")
  ,(2,"4 59 9")
  ,(4,"5 00 12")
  ,(5,"0 10 241")
  ,(6,"7 19 62")
  ,(7,"1 42 743")
  ,(8,"6 23 90")
]


sparkDF = sql.createDataFrame(input_list,['id','raw_str'])

sparkDF = sparkDF.withColumn('A',F.split(F.col('raw_str'),' ').getItem(0).cast(DoubleType()))\
                 .withColumn('B',F.split(F.col('raw_str'),' ').getItem(1).cast(DoubleType()))\
                 .withColumn('C',F.split(F.col('raw_str'),' ').getItem(2).cast(DoubleType()))


uDataDF = sparkDF.select(['A','B','C'])

uDataDF.show()
+---+----+-----+
|  A|   B|    C|
+---+----+-----+
|7.0| 6.0| 10.0|
|4.0|59.0|  9.0|
|5.0| 0.0| 12.0|
|0.0|10.0|241.0|
|7.0|19.0| 62.0|
|1.0|42.0|743.0|
|6.0|23.0| 90.0|
+---+----+-----+

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you this is very useful, but still having a hard time splitting the string as desired. Do you have any advice on how I can separate a string into 4 columns by using spaces? In the above example, you separated the string by '\.' Ps. splitting by '\ ' or ' ' did not work. Assuming there must be some other convention for spaces?
Updated the answer , space works just as fine as the delimiter
Spoke too soon! It worked by using '\t'. Thanks again for all your help
Accepted! Thanks again Vaebhav :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.