0

I have the below code where the Id is a 36 character GUID. The code gets executed but when a matching record is found , instead of updating it inserts the entire records again. What could be the root cause for this?

delta_table.alias("target").merge( deduped_df.alias("source"), "trim(upper(target.Id)) = trim(upper(source.dId)) " ).whenMatchedUpdate( set={ "Id" : "source.dId", "EntityId" : "source.EntityId", "PropertyName" : "source.PropertyName", "ValueString":"source.ValueString", "ValueInt" : "source.ValueInt", "ValueDecimal" : "source.ValueDecimal", "ValueBit" : "source.ValueBit", "ValidFrom" : "source.ValidFrom", "ValidTo" : "source.ValidTo", "Description" : "source.Description", "ModifiedBy" : "source.ModifiedBy", "CreatedAt" : "source.CreatedAt", "CreatedBy" : "source.CreatedBy", "Active" : "source.Active", "Saved" : "source.Saved", "ETL_UpdateDate" : "source.ETL_UpdateDate", "ETL_Source" : "source.ETL_Source" }).whenNotMatchedInsert(values={ "Id" : "source.dId", "EntityId" : "source.EntityId", "PropertyName" : "source.PropertyName", "ValueString":"source.ValueString", "ValueInt" : "source.ValueInt", "ValueDecimal" : "source.ValueDecimal", "ValueBit" : "source.ValueBit", "ValidFrom" : "source.ValidFrom", "ValidTo" : "source.ValidTo", "Description" : "source.Description", "ModifiedBy" : "source.ModifiedBy", "CreatedAt" : "source.CreatedAt", "CreatedBy" : "source.CreatedBy", "Active" : "source.Active", "Saved" : "source.Saved", "ETL_UpdateDate" : "source.ETL_UpdateDate", "ETL_LoadDate" : "source.ETL_LoadDate", "ETL_Source" : "source.ETL_Source" }).execute()
1
  • has it been solved ? anything from my answer was useful ? if not you can add comment Commented Nov 8 at 16:09

1 Answer 1

0

I think Your merge condition is not matching eventhough your ids on both side of the records are matching

check the data.

trim(upper(target.Id)) = trim(upper(source.dId))
  1. explicitly cast to string type before join

    "trim(upper(cast(target.Id as string))) = trim(upper(cast(source.dId as string)))`
    
  2. check uniqueness on each side like your df.groupBy("dId").count().filter("count > 1").show()

    NOTE : 36 character GUID may be equal but may have different representations like wrapped with curly brace and plain text make sure that they are uniform

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.