I have a table table1 that looks like this:
sentence data
good [{"pred": "yes", 'prob': 0.6}, {"pred": "maybe", "prob": 0.4}, {"pred": "another", "prob": 0.7}]
bad [{"pred": "unexpected", "prob": 0.4}, {"pred": "uncool", "prob": 0.3}]
and another table table2 that looks like:
sentence real_values
good ["another", "yes"]
bad ["no"]
I want to output a boolean column signifying if the set of values in array real_values matches set of pred values from data (comparing only values where prob >= 0.5)
So, for this, the result will be:
sentence | preds | real_values | is_match
-----------+-------
good | ['yes', 'another'] | ["another", "yes"] | true
bad | [] | ["no"] | false
(2 rows)
Here is what I tried so far:
SELECT sentence,
jsonb_path_query_array(data, '$[*] ? (@.prob >= 0.5).pred') as preds,
table2.real_values,
(
ARRAY(SELECT jsonb_path_query_array(data, '$[*] ? (@.prob >= 0.5).pred')) <@ ARRAY(SELECT table2.real_values)
AND ARRAY(SELECT jsonb_path_query_array(data, '$[*] ? (@.prob >= 0.5).pred')) @> ARRAY(SELECT table2.real_values)
) AS is_match
FROM table1
CROSS JOIN table2
WHERE table1.sentence = table2.sentence
GROUP BY table1.sentence, table2.real_values, table1.preds
;
AND it returns me:
sentence | preds | real_values | is_match
-----------+-------
good | ['yes', 'another'] | ['another', 'yes'] | false
bad | [] | ['no'] | false
The is_match should have been true for the first case, but is not. It does however, work when there is only 1 element in the array, not sure what is wrong with the query.