0

I am getting my urls and titles from a post's content, but the titles no longer seem to be UTF-8 and include some funky characters such as "Â" when I echo the result. Any idea why the correct charset isn't being used? My headers do use the right metadata.

I tried some of the solutions on here, but none seems to work so I thought I'd add my code below - just in case I'm missing something.

$servername = "localhost";
$database = "xxxx";
$username = "xxxxx";
$password = "xxxx";
$conn = mysqli_connect($servername, $username, $password, $database);


$post_id = 228;

$content_post = get_post($post_id);
$content = $content_post->post_content;
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $content);

$links = $doc->getElementsByTagName('a');


$counter = 0;
foreach ($links as $link){

$href = $link->getAttribute('href');
$avoid  = array('.jpg', '.png', '.gif', '.jpeg');

if ($href == str_replace($avoid, '', $href)) {

$title = $link->nodeValue;
$title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8');



$sql = "INSERT INTO wp_urls_download (title, url) VALUES ('$title', '$href')";
if (mysqli_query($conn, $sql)) {
$counter++;
echo "Entry" . $counter . ": $title" . "<br>";

} else {
echo "Error: " . $sql . "<br>" . mysqli_error($conn);
}

}

}

Updated Echo string - changed this after I initially uploaded the code. I have already tried the solutions in the other posts and was not successful.

12
  • Because you're not setting your database connection encoding?! Commented Aug 20, 2018 at 11:24
  • hmm, not really. I see what you are getting at, but I am just echoing the $title value at on the screen, so the database connection does not get involved (yet) Commented Aug 20, 2018 at 11:40
  • You are echoing what where exactly? What encoding is the content in? Commented Aug 20, 2018 at 11:43
  • ah my bad, I updated my code after posting this. I have now added the updated echo code where it just echos the $title. I have also added $title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8'); but no success. the original content is in utf-8. Commented Aug 20, 2018 at 11:48
  • Show bin2hex($title) and what you expect the title to look like. Commented Aug 20, 2018 at 11:48

2 Answers 2

2

Did you try to set the utf8 charset on the connection?

$conn->set_charset('utf8');

For more information: http://php.net/manual/en/mysqli.set-charset.php

Sign up to request clarification or add additional context in comments.

2 Comments

This didn't work for me on the load content, but setting it on the connection is a good one. Didn't think of that. Thanks.
Let me know if it works. I had before the same situation and changing the connection encoding worked well for me.
1

It seems that you have "double-encoding". What you expected was

Transverse Abdominis (TVA)

But what you have for the space before the parenthesis is a special space that probably came from Microsoft Word, then got converted to utf8 twice. In hex: A0 -> c2a0 -> c382c2a0.

Yes, the link to "utf8 all the way through" would ultimately provide the fix, but I think you need more help.

The A0 was converted from latin1 to utf8, then treating those bytes as if they were latin1 and repeating the conversion.

The connection provide the client's encoding via mysqli_obj->set_charset('utf8') (or similar).

Then the column in the table should be CHARACTER SET utf8mb4 (or utf8). Verify with SHOW CREATE TABLE. (It is probably latin1 currently.)

HTML should start with <meta charset=UTF-8>.

Trouble with UTF-8 characters; what I see is not what I stored

3 Comments

Ah that makes sense. I managed to get it to work when my question was locked, but is good to understand what's happened and why it's happened. Thanks.
@Remco - I hope you did not fix it with str_replace; that will fix only the one case; other cases may show up with different messes.
Oh no, I must say that I was tempted, but I realised that it would have been a risky and messy "fix" $doc->loadHTML('<?xml encoding="utf-8" ?>'. $content); did the trick for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.