I'm trying to join two tables based on a common ID, but there's a mismatch in dates across these files which I'm trying to normalise.
Given this data:
+-------+-------------------+----------------------------+
|dataset|id |topic |
+-------+-------------------+----------------------------+
|2020A |1128290566331031552|papuaNewguineaEarthquake2019|
|2020A |1128293303659716608|papuaNewguineaEarthquake2019|
|2020A |1152200235847966726|athensEarthquake2019 |
|2020A |1152204892083281920|athensEarthquake2019 |
|2020A |1152220394008522753|athensEarthquake2019 |
+-------+-------------------+----------------------------+
How would I, for example, replace the 2019 in papuaNewguineaEarthquake2019 with the first four numbers of the value in the dataset column so that:
papuaNewguineaEarthquake2019 becomes papuaNewguineaEarthquake2020?
In other words, how do I use regex to replace a subgroup in one column with a subgroup in another column?