6

I have table in sql server database which have column of json type.

Table - SomeTable
Id | Properties
1  | {"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}
2  | {"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}
...|...

I wrote select query which selects value of each field separately:

SELECT
    JSON_VALUE(Properties, '$.field1') as field1,
    JSON_VALUE(Properties, '$.field2') as field2,
    JSON_VALUE(Properties, '$.field3') as field3,
    JSON_VALUE(Properties, '$.field4') as field4
FROM SomeTable

I found this approach in Microsoft's docs (https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver15)

Can writing many JSON_VALUE-s in query harm performance ? Is SQL doing deserialization of string for each JSON_VALUE written in query.

3
  • I'm pretty sure the answer is "yes". But it is easy enough to check by timing simple queries (on enough data) with different numbers of references. Commented Jul 29, 2020 at 12:47
  • Is there better way for getting json fields separately using sql server ? Commented Jul 29, 2020 at 12:51
  • 2
    Using OPENJSON would likely be better Commented Jul 29, 2020 at 12:52

2 Answers 2

7

You may try to use OPENJSON() with an explicit schema to parse the JSON stored in the Properties column with one function call (against four or more JSON_VALUE() calls):

Table:

CREATE TABLE SomeTable (
   Id int,
   Properties varchar(1000)
)
INSERT INTO SomeTable (Id, Properties)
VALUES
   (1, '{"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}'),
   (2, '{"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}')

Statement:

SELECT s.Id, j.*
FROM SomeTable s
CROSS APPLY OPENJSON(s.Properties) WITH (
   field1 varchar(100) '$.field1',
   field2 varchar(100) '$.field2',
   field3 varchar(100) '$.field3',
   field4 varchar(100) '$.field4'
) j

Result:

Id  field1  field2  field3  field4
----------------------------------
1   value1  value2  value3  value4
2   value1  value2  value3  value4

As an additional note, the result from the JSON_VALUE() is a scalar value of type nvarchar(4000). With OPENJSON() and explicit schema you may define the appropriate data type for the returned columns.

Sign up to request clarification or add additional context in comments.

2 Comments

That is It ! Thanks for response
@TigranPetrosyan, OPENJSON is better if you want to select multiple columns. See the performance comparison done in the SO link: stackoverflow.com/questions/62506490/…
6

I thought I would do a test in regards to your question:

Can writing many JSON_VALUE-s in query harm performance ? Is SQL doing deserialization of string for each JSON_VALUE written in query.

So I create the below script and tested on my home work station:

/*
Id | Properties
1  | {"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}
2  | {"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}
*/

CREATE TABLE dbo.YourTable (ID int IDENTITY,
                            Properties nvarchar(4000));

WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1, N N2, N N3, N N4, N N5, N N6)
INSERT INTO dbo.YourTable (Properties)
SELECT N'{"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4"}'
FROM Tally;
GO

DECLARE @StartTime datetime2(7) = SYSDATETIME();

SELECT
    JSON_VALUE(Properties, '$.field1') as field1,
    JSON_VALUE(Properties, '$.field2') as field2,
    JSON_VALUE(Properties, '$.field3') as field3,
    JSON_VALUE(Properties, '$.field4') as field4
FROM dbo.YourTable;

PRINT DATEDIFF(MILLISECOND,@StartTime, SYSDATETIME());

SET @StartTime = SYSDATETIME();

SELECT YT.Id,
       OJ.field1,
       OJ.field2,
       OJ.field3,
       OJ.field4
FROM dbo.YourTable YT
     CROSS APPLY OPENJSON(YT.Properties)
                 WITH(field1 varchar(50),
                      field2 varchar(50),
                      field3 varchar(50),
                      field4 varchar(50)) OJ;

PRINT DATEDIFF(MILLISECOND,@StartTime, SYSDATETIME());

GO 10

DROP TABLE dbo.YourTable

This uses your solution, with JSON_VALUE and also an OPENJSON solution (which I'd written around the same time as Zhorov posted his answer). On my workstation, this resulted in the below output from the messages:

(1000000 rows affected)
5273
(1000000 rows affected)
3560
(1000000 rows affected)
5196
(1000000 rows affected)
3329
(1000000 rows affected)
5097
(1000000 rows affected)
3320
(1000000 rows affected)
5219
(1000000 rows affected)
3379
(1000000 rows affected)
5133
(1000000 rows affected)
3239
(1000000 rows affected)
5137
(1000000 rows affected)
3352
(1000000 rows affected)
5080
(1000000 rows affected)
3348
(1000000 rows affected)
5126
(1000000 rows affected)
3320
(1000000 rows affected)
5146
(1000000 rows affected)
3360
(1000000 rows affected)
5310
(1000000 rows affected)
3354

Notice that the second row, with OPENJSON, is about 2 second faster for every run. This is, however, with 1,000,000 million rows, but it does confirm that a single parse of the JSON is far faster than 4.

1 Comment

Interesting to find your post. I'm looking at an implementation processing large json blobs. I'm pulling 41 values out of it. The last guy used JSON_VALUE() for everything (41 in the select, and another 4 in the where clause). I figured that would lose out to OPENJSON easily. But I'm investigating why it's the opposite. In the query plans, all the JSON_VALUEs are in one Compute Scalar node. In the OPENJSON plan, there's an extra nested loops join.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.