1

I need to add sequence number to each row I am processing in a dataframe. But everytime when I add, we need to get the max of sequence from the existing rows and add + 1 and assign it to new row.

Any idea How we can achieve this with dataframe in spark scala.

Example.

below is the existing data in a table:

row_id,emp_id, sal
1,11,2000
2,22,3000

Now I need to add new row as follows to the table:

3,33,5000

we need to get row id every time when we are inserting new data to the table by getting max(row_id) from the table and add +1 to it.

Please suggest any ideas.

Thanks,

1 Answer 1

1

Spark DataFrames are immutable so it is not possible to append / insert rows. Instead use union. Here's a quick solution to your problem. This is not a good solution since you need to perform union every time a new row is added.

val data = spark
  .read
  .option("inferSchema", "true")
  .option("header", "true")
  .csv("data.csv")

data.createOrReplaceTempView("dView")
val sqld = spark.sql("SELECT MAX(row_id)+1,SUM(emp_id),SUM(sal) FROM dView")
val finalD = data.union(sqld)
finalD.show()
spark.stop()

data.csv

row_id,emp_id, sal
1,11,2000
2,22,3000

Output:

+------+------+----+
|row_id|emp_id| sal|
+------+------+----+
|     1|    11|2000|
|     2|    22|3000|
|     3|    33|5000|
+------+------+----+
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your response Binoy J. Suppose if I have a dataframe with 50 updated records and 20 Insert(new records). All update records will have row_id and insert records will not have the value in it. I need to take max(row_id) from the updated records and add 1 to it and add this value to insert records sequentially. Can this be implemented for the above senario..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.