When you have a long string of text to parse, it is probably best practice to deconstruct the string piece by piece instead of trying to use one BIG LONG regular expression. I was able to parse the JSON string piece by piece into the hash %keyValueHash via the following steps.
- Remove outer curly braces
{ ... } from entire line
- Separate entire line on
], or ]$ into @keyValuePair
- Split each value in
@keyValuePair on :
- Remove outer double quotes
" ... " from $key
- Remove outer brackets
[ ... ] from $value
- Remove all double quotes from
$value
- Split
$value on commas , and store in anonymous list, @value will be a pointer to this list
- Find maximum lengths of
@keys and @values for table formatting
Here is the code, each line is a contained command instead of one long mysterious regular expression.
#!/usr/bin/perl -w
my $s = '{"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush"],"carriers":[],"exclude_carriers":[]}';
my (@keyValuePair,@keys,@values, %keyValueHash);
#for printf table formatting
my ($largestKey, $largestValue) = (-1,-1);
#remove outer curly braces from entire line, original string preserved
my $copy = $s =~ s/^\{([\w\W]*?)}$/$1/r;
#separate entire line on '],' or ']$'
while( $copy =~ /([\w\W]*?)(\])(,|$)/g ){
push(@keyValuePair, $1.$2);
}
#separate each @keyValuePair on ':'
for(@keyValuePair){
my ($key, $value) = split(/:/,$_);
#remove double quotes from $key
$key =~ s/^"([\w\W]*?)"/$1/;
push(@keys, $key);
#remove outer brackets from $value
$value =~ s/^\[([\w\W]*?)]$/$1/;
#remove all double quotes from $value
$value =~ s/"//g;
#split $value on ',' and store in anonymous list, @values will contain a pointer to this list
push(@values, [split(/,/,$value)]);
#find maximum lengths of $keys and $values for printf table formatting
$largestKey = length($key) if(length($key) > $largestKey);
for $v ($values[$#values]){ #the +2 is because values in @values will be surrounded in double quotes
for(@$v){ $largestValue = length($_)+2 if(length($_) > $largestValue);}
}
}
#populate %keyValueHash with keys and values
@keyValueHash{@keys} = @values;
#print everything in key "category"
$key = "category";
print "Printing key \"$key\":\n";
printf("%-${largestKey}s : ",$key);
for(@{$keyValueHash{$key}}){ #dereference pointer to @values
printf("%-${largestValue}s","\"$_\"");
}
print "\n\n";
#print entire hash in printf formatted table
print "Print all keys and values:\n";
for $k (sort keys %keyValueHash){
printf("%-${largestKey}s : ",$k);
for(@{$keyValueHash{$k}}){ #dereference pointer to @values
printf("%-${largestValue}s","\"$_\"")
}
print "\n";
}
Output looks like this...
$ perl json.string.pl
Printing key "category":
category : "Jebb" "Bush"
Print all keys and values:
carriers :
category : "Jebb" "Bush"
countries :
device_types : "smartphone"
exclude_carriers :
isps : "a" "B"
network_types :
If you can't use a proper JSON parser like JSON::XS, this will probably be your best option. This way will be much easier to maintain if you need to add on or change functionality later. It will also be much better than trying to write something using raw SQL.
You can connect to the Maria database using the DBD::MariaDB driver, or the DBD::MySQL driver. I talked about how to do this in detail in this answer...
Need a regex_match checks if word doesn't starts with letters R or W and contains 4 letters and 3 numbers
I personally used the DBD::MySQL driver instead. The drivers are supposed to be interchangeable, but the MySQL driver is the only one that would install properly. You can try the libdbd-mariadb-perl package, but it gave me an error.
I installed the necessary packages on Ubuntu using the following command
sudo apt install libdbi-perl libdbd-mysql-perl
Once that is installed, connect to the database and run the above code on each JSON string you want to decode, and the data you are looking for will be in %keyValueHash. After the JSON is decoded, you can print reports or run additional inserts, updates, or selects against the database using that dataset.
The above code should also work if the JSON strings contain newlines.
(or however many items are in the array)? Get only string values ?