1

I'm using PostgreSQL database with VB.NET and ODBC (Windows).

I'm searching sentences for whole words by combining SELECT with a regular expression, like this:

"SELECT dtbl_id, name 
   FROM mytable 
  WHERE name ~*'" + "( |^)" + TextBox1.Text + "([^A-z]|$)"  

This searches well in some cases but because of syntax errors in text (or other reasons) it sometimes fails. For example, if I have the sentence

BILLY IDOL: WHITE WEDDING

the word "white" will be found. But if I have

CLASH-WHITE RIOT

then "white" will not be found, because there is no space between start of word "white".

The simplest solution would be to temporarily change or replace characters in the sentences :,.\/-= etc to spaces.

Is this possible to do in single SELECT line to be suitable for use with .NET/ODBC? Maybe inside the same regular expression?

If it is, how?

1 Answer 1

2

Try this:

SELECT 'CLASH-WHITE RIOT' ~ '[[:<:]]WHITE[[:>:]]';

[[:<:]] and [[:>:]] simply mean beginning and end of a word respectively

more info you can find at: http://www.postgresql.org/docs/9.1/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP

Sign up to request clarification or add additional context in comments.

4 Comments

Care must be taken with high UNICODE code points (like Japanese letters), because the regular expression engine is not reliable with them. Ongoing discussion by the Postgres developers here.
Thanks Szymon, very interesting. With adding one ~* engine becomes case insensitive what is good! But if I try to search O'CONNOR program crashes (I think at '). Additionaly, may be the unicode problem, if search term contains slavic letters like ANDRIĆ or MEŠA then search don't return any data (but it should cause such data exists). Any further recommendations?
To handle quoting issues, see the quote_literal() function or use whatever parameter substitution techniques are available in your development environment. To remove diacritical signs from characters before matching, see the unaccent extension. postgresql.org/docs/9.1/interactive/unaccent.html
Thanks kgritin for very valuable informations. I get some reseults with tsquery already. Besides, do you know how to treat signs like ".,;:" in data sentences like they are spaces? So I can get two words from "MR.JOHN" = MR and JOHN.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.