3

The iceberg documentation discusses using merge-on-read when deleting data. The documentation also refers to doing position deletes versus equality deletes. It seems straight forward to specify that I want merge-on-read in the table properties.

I've looked through the iceberg documentation and also found a half dozen external sites that talk about the pro's and con's of each method, but none of them describe how to specify position versus equality. Is this a table property? How do I choose a method?

I'm using spark 3.3 on EMR with scala/python

1 Answer 1

0

You don't need to specify POS or EQ delete. These two delete methods are automatically selected within the engine based on different scenarios.

To better use iceberg, you may need to pay attention to the following:

  • Use merge-on-read or cory-on-write
  • Merge files by specified policy
  • Expired snapshots and data deletion

Hope it helps you.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for that clarification @liliwei. Is there a way to force EQ delete? Our deletes are super expensive right now when trying to implement GDPR/CCPA optouts on a large data set. I've done performance testing on merge-on-read versus copy-on-write, delete versus merge and I don't even care about snapshots yet because iceberg deletes are prohibitively expensive.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.