I have two dataframes that I would like to join.
One dataframe is like this, where syscode_ntwrk is split up by dash.
spark.createDataFrame(
[
(1, '1234 - ESPN'),
(2, '1234 - ESPN'),
(3, '963 - CNN'),
(4, '963 - CNN'),
],
['id', 'col1']
)
And the other is in this format, where syscode_ntwrk is concatenated together.
spark.createDataFrame(
[
(100, '1234ESPN'),
(297, '1234ESPN'),
(3989, '963CNN'),
(478, '963CNN'),
],
['counts', 'col1']
)
Is there a way in the 2nd dataframe to create a new column to match the first dataframe for syscode_ntwrk? Syscode will always be a group of numbers, and ntwrk will always be a group of letters, so is there a regex to add a space dash space in between the two?