4

I am looking for a way of parsing real JavaScript object data (which, for a number of reasons, doesn't conform to the JSON standard) from within Perl.

I have found that the JSON module, does a fair job if the allow_singlequote and allow_barekey options are enabled, but I am still having problems parsing single-quoted values that contain escaped single quotes and unescaped double quotes. For instance,

{ label : 'can\'t process' }

and

{ label : '"bad" character' }

throw

illegal backslash escape sequence in string

and

invalid character encountered while parsing JSON string

respectively, because the module requires only the standard set of characters to be escaped, regardless of the containing quotes.

I thought I had found something that would work in the JSON::DWIW module, but it hasn't been updated since 2010 and I can't get it to install.

My only answer so far is to install a full-blown JavaScript engine using JavaScript, and run the string as JavaScript code. This works fine, but is far from straightforward and is very much overkill for what I want.

Does anyone have any suggestions for alternatives that I could try?

1 Answer 1

2

There are quite a few JSON modules knocking around on CPAN, and many of them have tolerant modes. However, none seem to be able to deal with this particular situation. (I thought JSONY might, but sadly not. However, I imagine if you report this case the author might be keen to make it work.)

My quick and dirty suggestion would be to take an existing pure Perl JSON module, and hack that to get it to work. JSON::Tiny is a good candidate. The following patch to the latest JSON::Tiny release on CPAN seems to do the trick for the two short examples you've provided:

--- Tiny.orig   2014-02-22 22:17:50.923272286 +0000
+++ Tiny.pm 2014-02-22 22:18:23.847435546 +0000
@@ -160,11 +160,13 @@
   until (m/\G$WHITESPACE_RE\}/gc) {

     # Quote
-    m/\G$WHITESPACE_RE"/gc
+    m/\G$WHITESPACE_RE(["']|[^\W0-9]\w+)/gc
       or _exception('Expected string while parsing object');

     # Key
-    my $key = _decode_string();
+    my $key = ($1 =~ /['"]/)
+      ? _decode_string()
+      : $1;

     # Colon
     m/\G$WHITESPACE_RE:/gc
@@ -187,13 +189,14 @@
 }

 sub _decode_string {
+  my $quote = shift;
   my $pos = pos;
   # Extract string with escaped characters
-  m!\G((?:(?:[^\x00-\x1f\\"]|\\(?:["\\/bfnrt]|u[0-9a-fA-F]{4})){0,32766})*)!gc; # segfault on 5.8.x in t/20-mojo-json.t #83
+  m!\G((?:(?:[^\x00-\x1f\\$quote]|\\(?:[$quote\\/bfnrt]|u[0-9a-fA-F]{4})){0,32766})*)!gc; # segfault on 5.8.x in t/20-mojo-json.t #83
   my $str = $1;

   # Invalid character
-  unless (m/\G"/gc) {
+  unless (m/\G$quote/gc) {
     _exception('Unexpected character or invalid escape while parsing string')
       if m/\G[\x00-\x1f\\]/;
     _exception('Unterminated string');
@@ -247,7 +250,8 @@
   m/\G$WHITESPACE_RE/gc;

   # String
-  return _decode_string() if m/\G"/gc;
+  return _decode_string(q["]) if m/\G"/gc;
+  return _decode_string(q[']) if m/\G'/gc;

   # Array
   return _decode_array() if m/\G\[/gc;
@@ -268,6 +272,9 @@
   # Null
   return undef if m/\Gnull/gc;  ## no critic (return)

+  # Bareword string
+  return $1 if m/\G([^\W0-9]\w+)/gc;
+
   # Invalid character
   _exception('Expected string, array, object, number, boolean or null');
 }
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.