I have table with SentenceID and bag of words(tokenizedsentence::varchar[]):
sID | TokenizedSentence
1 | {0, 0, 0, 0, 1, 1, 0, 0, 1, 0}
2 | {1, 1, 0, 0, 1, 1, 1, 1, 1, 1}
3 | {0, 1, 1, 0, 1, 0, 0, 0, 1, 1}
4 | {1, 1, 0, 1, 1, 0, 1, 0, 1, 1}
5 | {1, 0, 0, 0, 1, 1, 0, 0, 1, 0}
I want to compare sentences for similarity using bag of words representation. I wrote function, but I am missing something. The idea is to compare each array value to corresponding value, only if value is 1 (if the word is available in the sentence) and increase counter. After going through all the values devide counter by the length of the array. The function I wrote:
CREATE OR REPLACE FUNCTION bow() RETURNS float AS $$
DECLARE
length int:= array_length(nlpdata.tokenizedsentence, 1)
counter int;
result float;
BEGIN
FROM nlpdata a, nlpdata b;
FOR i IN 0..length LOOP
IF tokenizedSentence[i] = 1 THEN
IF a.tokenizedSentence[i] = b.tokenizedSentence[i] THEN
counter := counter + 1;
END IF;
END IF;
END LOOP;
result = counter / length
RETURN;
END;
$$ LANGUAGE plpgsql;
Also no idea how to delcare "FROM nlpdata a, nlpdata b". Any ideas?