I have two DF's(railroadGreaterFile, railroadInputFile).
I want to drop records from railroadGreaterFile if data in MEMBER_NUM column from railroadGreaterFile is matching the data in MEMBER_NUM column from railroadInputFile
Below is what i used:
val columnrailroadInputFile = railroadInputFile.withColumn("check", lit("check"))
val railroadGreaterNotInput = railroadGreaterFile
.join(columnrailroadInputFile, Seq("MEMBER_NUM"), "left")
.filter($"check".isNull)
.drop($"check")
Doing above, records are dropped, however i witnessed railroadGreaterNotInput's schema is combination of my DF1 and DF2 so when I try to write the railroadGreaterNotInput's data to file, it gives me below error
org.apache.spark.sql.AnalysisException: Reference 'GROUP_NUM' is ambiguous, could be: GROUP_NUM#508, GROUP_NUM#72
What should i be doing so that railroadGreaterNotInput would only contain fields from railroadGreaterFile DF?