I have a big-query schema such as this:
visitorId INTEGER NULLABLE
visitID INTEGER NULLABLE
hits RECORD REPEATED
hits.eventInfo RECORD NULLABLE
hits.eventInfo.eventCategory STRING NULLABLE
hits.eventInfo.eventLabel STRING NULLABLE
with sample data as:
visitorId visitId hits.eventInfo.eventCategory hits.eventInfo.eventCategory
123456 1 abc {"info":"secret", "otherfields":"blah"}
lmn {"info":"secret", "otherfields":"blah"}
xyz {"info":"secret", "otherfields":"blah"}
124557 1 abc {"info":"secret", "otherfields":"blah"}
lmn {"info":"secret", "otherfields":"blah"}
xyz {"info":"secret", "otherfields":"blah"}
I need to remove "info":"secret", only when the eventCategory is "abc".
I am a big-query newbie. After much hitting and trying I was able to come to this, but unfortunately stuck now.
UPDATE `project.dataset.ga_sessions_20200608`
SET hits = ARRAY(
SELECT AS STRUCT * REPLACE((REGEXP_REPLACE(eventInfo.eventLabel, r"\"info\":\"[a-z A-Z]*\",", "")) AS eventInfo.eventLabel) from UNNEST(hits)
)
WHERE (select eventInfo.eventLabel from UNNEST(hits)) LIKE '%info%'
There are two problems here.
- set part is not working :(
- subquery in where (subselect) is not giving a scalar output :'(
Any help, pointers will be appreciated.