2

I have the following source file. I have a name called "john" in my file wanted to split to list ['j','o','h','n']. Please find the person file as follows.

Source File:

id,name,class,start_data,end_date
1,john,xii,20170909,20210909

Code:

from pyspark.sql import SparkSession

def main():
    spark = SparkSession.builder.appName("PersonProcessing").getOrCreate()

    df = spark.read.csv('person.txt', header=True)
    nameList = [x['name'] for x in df.rdd.collect()]
    print(list(nameList))
    df.show()

if __name__ == '__main__':
    main()

Actual Output:

[u'john']

Desired Output:

['j','o','h','n']

4 Answers 4

5

If you want to in python:

nameList = [c  for x in df.rdd.collect() for c in x['name']]

or If you want to do it in spark:

from pyspark.sql import functions as F

df.withColumn('name', F.split(F.col('name'), '')).show()

Result:

+---+--------------+-----+----------+--------+
| id|          name|class|start_data|end_date|
+---+--------------+-----+----------+--------+
|  1|[j, o, h, n, ]|  xii|  20170909|20210909|
+---+--------------+-----+----------+--------+
Sign up to request clarification or add additional context in comments.

Comments

0
nameList = [x for x in 'john']

Comments

0

.tolist() turns a pandas series into a python list, so you should create a list first from the data and loop over the list created.

namelist=df['name'].tolist()
for x in namelist:
    print(x)

1 Comment

It is not a pandas dataframe.
0

If you are doing this in spark scala (spark 2.3.1 & scala-2.11.8 ) Below code works. We will get an extra record with blank name hence filtering it .

import spark.implicits._ val classDF = spark.sparkContext.parallelize(Seq((1, "John", "Xii", "20170909", "20210909"))) .toDF("ID", "Name", "Class", "Start_Date", "End_Date")

classDF.withColumn("Name", explode((split(trim(col("Name")), ""))))
  .withColumn("Start_Date", to_date(col("Start_Date"), "yyyyMMdd"))
  .withColumn("End_Date", to_date(col("End_Date"), "yyyyMMdd")).filter(col("Name").=!=("")).show

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.