2

In my project, sometimes it happens that user enters non-english characters in name column. I want to search multiple tables in my database for non-english characters and delete them or replace with '' (empty string).

I've already gone through following links but still answer is not clear:

  1. Replacing multiple characters in SQL
  2. Replacing characters in access SQL
  3. MSDN: Pattern matching using Regular Expression

Looking for a SQL script that will do this task. Currently I'm using the replace method, something like :

update TBLDAILYROUTINE a   set a.NAME = replace(a.NAME,'é','')
6
  • 3
    Shouldn't the app should validate the data before it gets into the database? Commented Feb 9, 2012 at 9:32
  • 1
    Exactly, put validation while the user enters all those values. Here you can use any method. But the most reliable is Regex. Database on the other hand is just a place of storage. Don't make it to make any processing except CRUD. Commented Feb 9, 2012 at 9:37
  • I agree with client validation, but maybe this is not an option and it could be bypassed by "smart" users. Server validation/replacing just can't fail. Commented Feb 9, 2012 at 9:42
  • @Thomas Then the developer must be "smarter" than "smart" users. All in all developers has much more previlidges and flexibility to constraint the user )) Commented Feb 9, 2012 at 9:51
  • Correct! But sometimes you just can't because dumb users can always find a way to break things :-) Just an example: if the input is a web form the user can always view html source and reproduce input fields bypassing client validation... Commented Feb 9, 2012 at 9:59

3 Answers 3

2

As others have commented it's best to make the validation and replacements before the data enters the database.

In order to clean up existing data the easiest way would probably be to use a SQL CLR UDF that can be run on the columns needed to be cleaned up.

A quick and dirty C# snippet to clean off any non ASCII characters (with the side effect that '?' is also removed):

public static string StripNonAsciiContent(string source)
{
    Encoding utf8 = new UTF8Encoding();
    Encoding ascii = Encoding.ASCII;

    byte[] input = utf8.GetBytes(source); 
    byte[] output = Encoding.Convert(utf8, ascii, input);

    return ascii.GetString(output).Replace("?", "");
}

A better approach would be to write a Regular expression that replaces all characters outside of the allowed range:

public static string StripNotAllowedCharacters(string source)
{
    return Regex.Replace(source, @"[^a-zA-Z-]", "");
}
Sign up to request clarification or add additional context in comments.

Comments

0

How abount using a function? Create the function:

CREATE FUNCTION dbo.sf_ReplaceChars (@MyString NVARCHAR(MAX))
RETURNS NVARCHAR(50)
AS
BEGIN
    DECLARE @tmpString NVARCHAR(100)
    SET @tmpString = @MyString 
    SET @tmpString = LOWER(@tmpString)
    SET @tmpString =  REPLACE(@tmpString,'à', 'a')
    SET @tmpString =  REPLACE(@tmpString,'è', 'e')
    SET @tmpString =  REPLACE(@tmpString,'é', 'e')
    SET @tmpString =  REPLACE(@tmpString,'ì', 'i')
    SET @tmpString =  REPLACE(@tmpString,'ò', 'o')
    SET @tmpString =  REPLACE(@tmpString,'ù', 'u')
    SET @tmpString =  REPLACE(@tmpString,'ç', 'c')
    RETURN @tmpString
END
GO

then use it whenever needed, assuming the user input is stored in variable @userInput:

INSERT INTO myTable (text) VALUES select dbo.sf_ReplaceChars(@userInput)

You can add more REPLACE to the function to cover all your needs.

Regards Thomas

4 Comments

Error prone. What if there will be ó, ñ or even қғҳ?
You'll end up with infinite numbers of replaces and will never reach the end. This in turn will slow down your server when there will be too much replacements.
Of course you are right, but this is to fix the user input: how many input there will be? By the way he could also set up a stored proc to nightly correct every column of every table...
thats what I'm trying to do but need any optimized (and somewhat easy too :) ) way to do that.
0

Run the bellow query:

declare @table varchar(50)
declare @column varchar(50)
declare @updateCommand varchar(50)

DECLARE AllMyColumns CURSOR FOR
    SELECT top 1 OBJECT_NAME(c.OBJECT_ID) TableName, c.name ColumnName
    FROM sys.columns AS c
    JOIN sys.types AS t ON c.user_type_id=t.user_type_id
    WHERE t.name = 'varchar' --select your datatype
    ORDER BY c.OBJECT_ID
OPEN AllMyColumns
FETCH NEXT FROM AllMyColumns into @table, @column;

WHILE @@FETCH_STATUS = 0 BEGIN
     set @updateCommand = 'update '+ @table + ' set '+ @column + ' = replace('+ @column +',''é'','''')'

    exec(@updateCommand)
    FETCH NEXT FROM AllMyColumns into @table, @column;
end

close AllMyColumns
DEALLOCATE AllMyColumns

I edited with the update command you suggested

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.