0

I have a dataframe df1 of the format

+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| A    | z    | m    |
| B    | w    | n    |
| C    | x    | o    |
| A    | z    | n    |
| A    | p    | o    |
+------+------+------+

and another dataframe df2 of the format

+------+------+
| Col1 | Col2 |
+------+------+
| 0-A  | 0-z  |
| 1-B  | 3-w  |
| 2-C  | 1-x  |
|      | 2-P  |
+------+------+-

I am trying to replace the values in Col1 and Col2 of df1 with values from df2 using Spark Java.

The end dataframe df3 should look like this.

+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 0-A  | 0-z  | m    |
| 1-B  | 3-w  | n    |
| 2-C  | 1-x  | o    |
| 0-A  | 0-z  | n    | 
| 0-A  | 2-p  | o    |
+------+------+------+

I am trying to replace all the values in the column1 and column2 of df1 with values from col1 and col2 of df2. Is there anyway that i can achieve this in Spark Java dataframe syntax.?

The initial idea i had was to do the following.

String pattern1="\\p{L}+(?: \\p{L}+)*$";

df1=df1.join(df2, df1.col("col1").equalTo(regexp_extract(df2.col("col1"),pattern1,1)),"left-semi");
2
  • do you have actual data with same data format like 0-A and A, simply is it possible to check df1.col1 is in df2.col1? or df1.col1 can have any value Commented Feb 14, 2020 at 6:27
  • The data is of the same format like i specified above.I just need to replace the value in the df1 using the value in df2. Commented Feb 14, 2020 at 6:36

1 Answer 1

1

Replace your last join operation with below join.

df1.alias("x").join(df2.alias("y").select(col("y.Col1").alias("newCol1")), col("x.Col1") === regexp_extract(col("newCol1"),"\\p{L}+(?: \\p{L}+)*$",0), "left")
              .withColumn("Col1", col("newCol1"))
              .join(df2.alias("z").select(col("z.Col2").alias("newCol2")), col("x.Col2") === regexp_extract(col("newCol2"),"\\p{L}+(?: \\p{L}+)*$",0), "left")
              .withColumn("Col2", col("newCol2"))
              .drop("newCol1", "newCol2")
              .show(false)

+----+----+----+
|Col1|Col2|Col3|
+----+----+----+
|2-C |1-x |o   |
|0-A |0-z |m   |
|0-A |0-z |n   |
|0-A |2-p |o   |
|1-B |3-w |n   |
+----+----+----+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.