0

I've got a table in SQL Server with a full-text index on an NVARCHAR column, and I want my website's users to be able to search through the table for data that matches their search string. I want to use the CONTAINS predicate to improve performance.

I'm aware that I could use SQL Server's LIKE operator to achieve the same thing, but as detailed here on StackOverflow, LIKE can't use full-text indexes in the same way, impacting performance.

Note that Microsoft's Query With Full-Text Search page details asterisks, double quotes, FORMSOF, AND, OR, NOT, and a whole bunch of other control characters and keywords for CONTAINS. I can write a parser to validate/sanitise the input myself, but not only is that difficult and time-consuming, I run the risk that future versions introduce new keywords that my validation misses. Anyway, this really feels like the kind of validation that Microsoft should've written themselves, so that I can reuse it easily.

How do I properly sanitise user input to CONTAINS to avoid all possible special input? Alternatively, how can I validate that the input doesn't contain special input, so that I can return a validation message to the user if it does?

Let me be clear: I don't need to give users fancy functionality, such as the ability to search for rows that contain either 'nymph' or 'jocks'. (Even if I did, I'd want the ability to build that statement manually, and I'd either sanitise both of their inputs to avoid malformed queries, or I'd validate them so that I can reject inputs with special input.) I just want them to be able to type a word and click Search without the risk that a typo sends them to the 500 error page... nor the risk that a script kiddie crashes my website with a carefully crafted string that performs a Denial-of-Service attack against CONTAINS.

(Bonus question: is there a simpler way to build a performant and secure text search?)

Run this in SQL Server 2022 Express:

CREATE TABLE TestData
(
    Id BIGINT NOT NULL IDENTITY (1, 1) CONSTRAINT PK_TestData_Id PRIMARY KEY CLUSTERED,
    DataCol NVARCHAR(200) NOT NULL
)

GO

INSERT INTO TestData (DataCol) VALUES ('Waltz, bad nymph, for quick jigs vex.')
INSERT INTO TestData (DataCol) VALUES ('Sphinx of black quartz, judge my vow.')
INSERT INTO TestData (DataCol) VALUES ('Glib jocks, quiz nymph to vex dwarf.')
INSERT INTO TestData (DataCol) VALUES ('Cwm fjord glyphs vext bank quiz.')
--plus millions of others

GO

--Not necessary if your database already has a default full-text catalog
CREATE FULLTEXT CATALOG TestDataCatalog AS DEFAULT;

GO

CREATE FULLTEXT INDEX ON TestData (DataCol LANGUAGE 1033)
KEY INDEX PK_TestData_Id
WITH
(CHANGE_TRACKING = OFF, STOPLIST = SYSTEM)
;

GO

CREATE PROCEDURE SearchTestData
    @Name NVARCHAR(200)
AS
BEGIN
    SELECT DataCol
    FROM TestData
    WHERE CONTAINS(DataCol, @Name)
END

Now let's test the above:

EXEC SearchTestData 'nymph'

We get back both rows that contain the word 'nymph' and no other rows. We can confirm with SSMS's Display Estimated Execution Plan button that the stored procedure is using the full-text index.

So far so good, right? Wrong. The user can put CONTAINS-specific control characters and keywords into their input, accidentally or deliberately. Try this line:

EXEC SearchTestData 'nymph,'

Did you think that this would return the single entry that contains 'nymph,', or perhaps both entries with the word 'nymph' and a comma in them? Nope, it crashes:

Syntax error near ',' in the full-text search condition 'nymph,'.

Let's try SQL injection:

EXEC SearchTestData 'dsg; DROP TABLE SearchTestData; --'

At least SQL Server's query parameterisation prevents SQL injection attacks from the website input getting into the SearchTestData stored procedure, but we do get the same user input problem as before:

Syntax error near 'DROP' in the full-text search condition 'dsg; DROP TABLE SearchTestData; --'.

8
  • 1
    This is a job for the application side, not the T-SQL side. Don't have the user input an expression for CONTAINS, have them provide values that you will then build to make an appropriate expression for CONTAINS. Commented Feb 28, 2024 at 17:10
  • I'm a full-stack developer. I'm writing the SQL in the back-end and building the JS/HTML/CSS in the front-end. I left it open-ended whether I'm supposed to change the input within the stored procedure or outside of it. When you say "you will then... make an appropriate expression for CONTAINS", that's exactly what I've asked for help with. Commented Feb 28, 2024 at 17:43
  • But we have no idea what language your application is written in, or what information you're taking from the user. Everything you've tagged is SQL Server related, so you are asking how to do this in SQL Server. The answer to that is: don't. Commented Feb 28, 2024 at 18:04
  • The stored procedure denotes the divide between application code and database code. I assumed that I could add a bunch of control characters to get it to work, such as changing a raw input of 'nymph' into an encoded '"nymph*"' or something. Commented Feb 28, 2024 at 18:12
  • this might helphttps://stackoverflow.com/questions/139199/can-i-protect-against-sql-injection-by-escaping-single-quote-and-surrounding-use Commented Feb 28, 2024 at 18:12

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.