apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.
My problem is that Unicode characters are encoded as eg \u2019 in the JSON, decode_json appears to convert this to \x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.
Sample code:
use warnings;
use strict;
use JSON qw( decode_json );
use Data::Dumper;
open IN, $file or die;
binmode IN, ":utf8";
my $data = <IN>;
my $json = decode_json( $data );
open OUT, ">$outfile" or die;
binmode OUT, ":utf8";
binmode STDOUT, ":utf8";
foreach my $textdat (@{ $json->{'results'} }) {
print STDOUT Dumper($textdat);
my $text = $textdat->{'text'};
print OUT "$text\n";
}
The Dumper output shows that the \u encoding has been converted to \x encoding. What am I doing wrong?