1

Requirement : Need to populate 4 digit row number with prefix 000

Example : 0001,0002.....0011,0012

Here I am repeating number of zero to prefix based on the length of the row number value i.e in column PAGENO

df.select(F.repeat(F.lit(0), 3))

The value 3 needs to change dynamically based on row number value.

My idea to achive dynamic 0 replication:

df.select(F.repeat(F.lit(0),(4 - F.length(df["PAGENO"]))))

getting error:

'Column' object is not callable

When passing any column or parameter instead of just numeric 3 as no of times repeat should work.

1 Answer 1

1

You can use it within an SQL expression:

df.select(F.expr("repeat(0, length(PAGENO))")).show()

However, if I've correctly understood your question you want to use lpad function. Here's an example:

df = spark.createDataFrame([(1,), (2,), (10,), (12,), (11,)], ["PAGENO"])

df1 = df.withColumn("PAGENO_2", F.expr("lpad(PAGENO, 4, '0')"))

df1.show()
#+------+--------+
#|PAGENO|PAGENO_2|
#+------+--------+
#|     1|    0001|
#|     2|    0002|
#|    10|    0010|
#|    12|    0012|
#|    11|    0011|
#+------+--------+
Sign up to request clarification or add additional context in comments.

2 Comments

thanks @blackbishop , this made my day easy. in extension to the above requirement , i need to reset the "PAGENO" once it reaches to 9999 and start from 0001 freshly. how to achieve this any suggestion?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.