Pyspark create testing data with string type

Question

I am trying to create testing data frame with one column with Int and one column with String type. With output similar to below. I reckon for Int we could use

data = spark.range(1, 5)
output = dataset.withColumnRenamed('id','myid')

How do we deal with that string column? Many thanks for your help!

Expected output:

      id.     ordernum
       1       0032
       2       0033
       3       0034
       4       0035
       5       0036

mck · Accepted Answer · 2021-02-23 08:41:23Z

2

You can create a Spark dataframe from a list of lists. Here is an example:

data = [[i, '%04d' % (i+31)] for i in range(1,6)]
# [[1, '0032'], [2, '0033'], [3, '0034'], [4, '0035'], [5, '0036']]

df = spark.createDataFrame(data, ['id', 'ordernum'])
df.show()
+---+--------+
| id|ordernum|
+---+--------+
|  1|    0032|
|  2|    0033|
|  3|    0034|
|  4|    0035|
|  5|    0036|
+---+--------+

If you prefer Spark range, you can use format_string:

import pyspark.sql.functions as F
df = spark.range(1, 6).withColumn(
    'ordernum',
    F.format_string('%04d', F.col('id') + 31)
)

df.show()
+---+--------+
| id|ordernum|
+---+--------+
|  1|    0032|
|  2|    0033|
|  3|    0034|
|  4|    0035|
|  5|    0036|
+---+--------+

edited Feb 23, 2021 at 8:41

answered Feb 23, 2021 at 8:26

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

blackbishop · Accepted Answer · 2021-02-23 08:30:27Z

2

You can use lpad function to create ordernum column from id + 31 column left padded with 0 to get a string number with 4 digits:

from pyspark.sql import functions as F

output = spark.range(1, 6).withColumn("ordernum", F.lpad(col("id") + 31, 4, '0'))

output.show()
#+---+--------+
#| id|ordernum|
#+---+--------+
#|  1|    0032|
#|  2|    0033|
#|  3|    0034|
#|  4|    0035|
#|  5|    0036|
#+---+--------+

answered Feb 23, 2021 at 8:30

blackbishop

32.8k11 gold badges61 silver badges86 bronze badges

Collectives™ on Stack Overflow

Pyspark create testing data with string type

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related