add sequence number column in dataframe usnig scala

Question

below is the logic to add sequence number column in dataframe. Its working as expected when I am reading data from delimited files. Today I have a new task to read the data from oracle table and add Sequence number and process further. I am facing issue with below logic to add sequence number in data frame when I read it from oracle table.

oracleTableDF is my dataframe

   //creating Sequence no. logic for SeqNum
   val rowRDD = oracleTableDF.rdd.zipWithIndex().map(indexedRow => Row.fromSeq((((indexedRow._2.toLong+1)).toLong) +: indexedRow._1.toSeq)) 

  //creating StructType to add Seqnum in schema
        val newstructure = StructType(Array(StructField("SeqNum",LongType)).++(oracleTableDF.schema.fields))

  //creating new Data Frame with seqnum
  oracleTableDF = spark.createDataFrame(rowRDD, newstructure)

I am not able to locate the actual Issue. because the logic is working as expected in cluster when I read it from files. but facing some issue when I read it from oracle table. its working as expected in local mode also.

below is the error :

"ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, xxxx, executor 1): java.lang.NoClassDefFoundError: Could not initialize class oracleDataProcess$"

Yes, I need to add row number to each records.its working good for delimited files but sure when I tried for oracle table facing issue. — Pravinkumar Hadpad
– Pravinkumar Hadpad, Commented Sep 11, 2017 at 7:52
You can use Window function row_number instead of monotonicallyIncreasingId but you would have to work with a single partition to generate continuous row_number. — philantrovert
– philantrovert, Commented Sep 11, 2017 at 8:26

Leo C · Accepted Answer · 2018-02-07 06:31:38Z

6

If all you need is to add a column to your dataframe with an auto-increment integer value, you can use monotonicallyIncreasingId which is of LongType:

val oracleTableDF2 = oracleTableDF.withColumn("SeqNum", monotonicallyIncreasingId)

[UPDATE]

Note that monotonicallyIncreasingId is deprecated. monotonically_increasing_id() should be used instead.

edited Feb 7, 2018 at 6:31

answered Sep 11, 2017 at 7:45

Leo C

22.5k3 gold badges28 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Pravinkumar Hadpad Over a year ago

As per suggestion on "stackoverflow.com/questions/35705038/…" its not a clear function. and its does not take any parameters. I need to pass some parameters on this.

Leo C Over a year ago

I believe issue SPARK-14393 has already been resolved since Spark 2.1. You're right that monotonicallyIncreasingId doesn't take parameter.

ROOT Over a year ago

@LeoC, is there any way to get the values from 1?, now it is printing from 0.

Leo C Over a year ago

@user4342532, monotonically_increasing_id() guarantees only generating Longs in a monotonically increasing fashion. It may start from 0 or some Long integer. If you want to have control in the id assignment, you should consider using zipWithIndex on a RDD, or row_number() Window function as illustrated in part of this blog post.

Leo C Over a year ago

@user4342532, unfortunately monotonically_increasing_id() only guarantees the generated numbers are monotonically increasing. There is no guarantee they will be consecutive integers.

|

Prasad Khode · Accepted Answer · 2017-09-11 07:38:32Z

0

One option you can use is monotonically_increasing_id() to create a new column with an incremental id

val dataFrame = oracleTableDF.withColumn("incremental_id", monotonically_increasing_id())

answered Sep 11, 2017 at 7:38

Prasad Khode

6,77712 gold badges48 silver badges62 bronze badges

Collectives™ on Stack Overflow

add sequence number column in dataframe usnig scala

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related