I am new to SQL so this question is probably fairly basic, yet I've spent quite a bit of time on it without success.
I was given a block of SQL code that I need to modify. This block currently extracts all tokens ("words") from a text index. What I need to do is to loop over each token and (1) extract all char N-grams of a given length N from it and then (2) insert them into a table.
So if a token is abcdf and N=3 then I need to extract abc, bcd, and cdf
I am struggling with (1).
I am doing it in Oracle's PL/SQL: Release 11.2.0.3.0
What I have now is this:
CREATE OR REPLACE PROCEDURE extract_ngrams
AS
tokens ctx_doc.token_tab;
w VARCHAR2(255);
NgramLen NUMBER;
ngr VARCHAR(10);
BEGIN
-- extract all tokens ("words")
CTX_DOC.TOKENS(myindex,textkey,tokens);
NgramLen := 3;
FOR i IN tokens.FIRST..tokens.LAST LOOP
w := tokens(i).token;
dbms_output.put_line(w);
-- Use SUBSTRING + LEN or some other approach to extract Ngrams of length=NgramLen
-- insert each Ngram into a table
end loop;
END;
/
What is the right way to combine FOR loop with SUBSTRING, LEN and/or whatever other appropriate functions to extract ngrams into ngr in a loop in a way similar to how tokens are extracted into w?
Thx!