3

I've recently moved to a different platform for my personal website, and I've run into a problem where the previous encoding of characters such as ", "", and ' are now recoded strangely as:

“
”
’
’
'

I've seen this before, and last time, I went through manually and updated each article. This time, however, I'd like to take a more pragmatic approach by updating the database.

How would I go about replacing all occurences of these strings with their correct character?

I'm thinking that it would be somehting like:

SELECT REPLACE(''',''')

But do I need to be cautious and include escape characters like \? Also, how would I perform this type of replacement across the entire database?

Note: I'll be using phpMyAdmin to perform these replacements, so I'm hoping that it's just a matter of typing a series of commands into the "SQL" tab. Although, I do have access to the MySQL server from the command line if it's necessary.

Update:

More about the structure:

  • The table name is "field_data_comment_body"
  • The field name is "comment_body_value"
  • The field in question is of type "longtext"

I've tried running Johan's recommendation, but it returns 0 Affected rows:

DELIMITER $$

CREATE FUNCTION FixEncoding(input longtext) RETURNS longtext
BEGIN
  DECLARE output longtext;

  SET output = input;   
  SET output = REPLACE(output,''','\'');
  SET output = REPLACE(output,'’','\'');
  SET output = REPLACE(output,'” ','"');
  SET output = REPLACE(output,'“','"');
  SET output = REPLACE(output,'’','\'');

  RETURN output;
END $$

DELIMITER ;

UPDATE field_data_comment_body SET comment_body_value = FixEncoding(comment_body_value) WHERE entity_id <> 0;

Update: It's not a translation error as this returns 63 rows:

SELECT  `comment_body_value` 
FROM  `field_data_comment_body` 
WHERE  `comment_body_value` LIKE  '%&amp;#039;%'
LIMIT 0 , 30
3
  • How many rows does a select count(*) as rowcount from field_data_comment_body WHERE entity_id <> 0; return? Commented May 21, 2011 at 23:47
  • 63 rows. I think I got it sorted with: update field_data_comment_body set comment_body_value = replace(comment_body_value,'&amp;#039;','\''); Although it's not clear why your function didn't work, perhaps something to do with the field type? Commented May 21, 2011 at 23:49
  • It works with varchar, not sure 'bout longtext. Commented May 22, 2011 at 0:12

1 Answer 1

3

In MySQL characters can be escaped by using \.

I'd write a function to do the replacing for you and than do an update, something like this.

DELIMITER $$

CREATE FUNCTION FixEncoding(input varchar) RETURNS varchar
BEGIN
  DECLARE output varchar;

  SET output = input;   
  SET output = REPLACE(output,'&#039;','\'');
  SET output = REPLACE(output, .....
  .....

  RETURN output;
END $$

DELIMITER ;

UPDATE table1 SET column1 = FixEncoding(Column1) WHERE id <> 0;

If this doesn't work then you might be suffering from translation issue between the database and the presentation layer.
Make a backup of your database
and change the encoding of your table(s) by using:

ALTER TABLE `test`.`test` CHARACTER SET latin1 COLLATE latin1_general_ci;
Sign up to request clarification or add additional context in comments.

4 Comments

I've tried running an adaptation of the code you provided, but it seems that it has no effect on the rows.
In that case you are suffering from an translation issue between the database and the presentation layer, try to change the encoding of the database to latin1.
I'm not sure if that's the case, the query that I've updated my question with displays the effected rows as expected
While this function didn't work for longtext, it still led me to my answer +1 +accept, thanks Johan!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.