I'd like to add a column to a table and then fill it with values from another table. Below is a highly simplified version of my problem.
CREATE TABLE table_1 (
id INT,
a DECIMAL(19,2)
)
INSERT INTO TABLE table_1 VALUES (1, 3.0)
INSERT INTO TABLE table_1 VALUES (2, 4.0)
CREATE TABLE table_2 (
id INT,
b DECIMAL(19,2),
c DECIMAL(19,2)
)
INSERT INTO TABLE table_2 VALUES (1, 1.0, 4.0)
INSERT INTO TABLE table_2 VALUES (2, 2.0, 1.0)
-- The next two parts illustrate what I'd like to accomplish
ALTER TABLE table_1 ADD COLUMNS (d Decimal(19,2))
UPDATE table_1
SET d = (table_1.a - table_2.b) / table_2.c
FROM table_2
WHERE table_1.id = table_2.id
In the end SELECT * FROM table_1 would produce something like this:
+---+----+----+
| id| a| d|
+---+----+----+
| 1|3.00|0.50|
| 2|4.00|2.00|
+---+----+----+
However, when I run the update commands, Spark (version 2.4) immediately complains about the update statement.
UPDATE table_1 ...
^^^
Ultimately I need a table with the same name as the original table and with the new column. Using only Spark SQL, what can I do to accomplish my objective? It seems like I can't perform an update, but are there SQL hacks I can do that accomplish the same end result? In my real problem, I need to add about 100 columns to a large table, so the solution should also not drag down performance or make lots of copies of the data and eat up disk space.
Another way of rephrasing my question is, can I accomplish the DataBricks equivalent of an UPDATE (see here) using the open source version of Spark?