1

I am new to SQL so this question is probably fairly basic, yet I've spent quite a bit of time on it without success.

I was given a block of SQL code that I need to modify. This block currently extracts all tokens ("words") from a text index. What I need to do is to loop over each token and (1) extract all char N-grams of a given length N from it and then (2) insert them into a table.

So if a token is abcdf and N=3 then I need to extract abc, bcd, and cdf

I am struggling with (1).

I am doing it in Oracle's PL/SQL: Release 11.2.0.3.0

What I have now is this:

CREATE OR REPLACE PROCEDURE extract_ngrams
AS
     tokens ctx_doc.token_tab;
     w VARCHAR2(255);
     NgramLen NUMBER;
     ngr VARCHAR(10);
BEGIN

-- extract all tokens ("words")
CTX_DOC.TOKENS(myindex,textkey,tokens);

NgramLen := 3;

FOR i IN tokens.FIRST..tokens.LAST LOOP
  w := tokens(i).token;
  dbms_output.put_line(w);
  -- Use SUBSTRING + LEN or some other approach to extract Ngrams of length=NgramLen
  -- insert each Ngram into a table
end loop;

END;
/

What is the right way to combine FOR loop with SUBSTRING, LEN and/or whatever other appropriate functions to extract ngrams into ngr in a loop in a way similar to how tokens are extracted into w?

Thx!

1 Answer 1

1

Here an example of what you want :

FOR idx IN 1..(LENGTH(w)-NgramLen+1)
  LOOP
    -- do what you want here
    DBMS_OUTPUT.PUT_LINE(SUBSTR(w,idx,NgramLen));
  END LOOP;

and don't forget to declare idx NUMBER;

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.