2

I have json documents in my Couchbase cluster that looks like this

 {
    "giata_properties": {
      "propertyCodes": {
        "provider": [
          {
            "code": [
              {
                "value": [
                  {
                    "name": "Country Code",
                    "value": "EG"
                  },
                  {
                    "name": "City Code",
                    "value": "HRG"
                  },
                  {
                    "name": "Hotel Code",
                    "value": "91U"
                  }
                ]
              }
            ],
            "providerCode": "gta",
            "providerType": "gds"
          },
          {
            "code": [
              {
                "value": [
                  {
                    "value": "071801"
                  }
                ]
              },
              {
                "value": [
                  {
                    "value": "766344"
                  }
                ]
              }
            ],
            "providerCode": "restel",
            "providerType": "gds"
          },
          {
            "code": [
              {
                "value": [
                  {
                    "value": "HRG03Z"
                  }
                ]
              },
              {
                "value": [
                  {
                    "value": "HRG04Z"
                  }
                ]
              }
            ],
            "providerCode": "5VF",
            "providerType": "tourOperator"
          }
        ]
      }
    }
  }

I'm trying to create a query that fetches a single document based on the value of giata_properties.propertyCodes.provider.code.value.value and a specific providerType.

So for example, my input is 071801 and restel, I want a query that will fetch me the document I pasted above (because it contains these values).

I'm pretty new to N1QL so what I tried so far is (without the providerType input)

SELECT * FROM giata_properties AS gp 
WHERE ANY `field` IN `gp.propertyCodes.provider.code.value` SATISFIES `field.value` = '071801' END;

This returns me an empty result set. I'm probably doing all of this wrongly.

edit1:

According to geraldss answer I was able to achieve my goal via 2 different queries

1st (More general) ~2m50.9903732s

SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp SATISFIES v.`value` = '071801' END;

2nd (More specific) ~2m31.3660388s

SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp.propertyCodes.provider[*].code SATISFIES v.`value` = '071801' END;

Bucket have around 550K documents. No indexes but the primary currently.

Question part 2

When I do either of the above queries, I get a result streamed to my shell very quickly, then I spend the rest of the query time waiting for the engine to finish iterating over all documents. I'm sure that I'll be only getting 1 result from future queries so I thought I can use LIMIT 1 so the engine stops searching on first result, I tried something like

SELECT * FROM giata_properties AS gp WHERE ANY v WITHIN gp SATISFIES v.`value` = '071801' END LIMIT 1;

But that made no difference, I get a document written to my shell and then keep waiting until the query finishes completely. How can this be configured correctly?

edit2:

I've upgraded to the latest enterprise 4.5.1-2844, I have only the primary index created on giata_properties bucket, when I execute the query along with the LIMIT 1 keyword it still takes the same time, it doesn't stop quicker.

I've also tried creating the array index you suggested but the query is not using the index and it keeps insisting on using the #primary index (even if I use USE INDEX clause).

I tried removing SELF from the index you suggested and it took a much longer time to build and now the query can use this new index, but I'm honestly not sure what I'm doing here.

So 3 questions:

1) Why LIMIT 1 using primary index doesn't make the query stop at first result?

2) What's the difference between the index you suggested with and without SELF? I tried to look for SELF keyword documentation but I couldn't find anything.

This is how both indexes look in Web ui

Index 1 (Your original suggestion) - Not working

CREATE INDEX `gp_idx1` ON `giata_properties`((distinct (array (`v`.`value`) for `v` within (array_star((((self.`giata_properties`).`propertyCodes`).`provider`)).`code`) end)))

Index 2 (Without SELF)

CREATE INDEX `gp_idx2` ON `giata_properties`((distinct (array (`v`.`value`) for `v` within (array_star(((self.`propertyCodes`).`provider`)).`code`) end)))

3) What would be the query for a specific giata_properties.propertyCodes.provider.code.value.value and a specific providerCode? I managed to do both separately but I wasn't successful in merging them.

Thanks for all your help dear

6
  • Does your document actually contain the giata_properties field? Commented Oct 13, 2016 at 16:10
  • @geraldss No, sorry. Here is a real sample of the documents I have in the giata_properties bucket pastebin.com/mbXFMPkA Commented Oct 16, 2016 at 7:13
  • Ok, that would explain any mismatches in the solution, @pretty. Commented Oct 16, 2016 at 14:59
  • @geraldss My bad. Could you please answer my 3 questions above? I really need a query (and an index) that fetches a document based on value and providerCode. Commented Oct 16, 2016 at 15:08
  • Can you post a new clean question. I'll answer. Commented Oct 16, 2016 at 15:11

1 Answer 1

3

Here is a query without the providerType.

EXPLAIN SELECT *
FROM giata_properties AS gp
WHERE ANY v WITHIN gp.giata_properties.propertyCodes.provider[*].code SATISFIES v.`value` = '071801' END;

You can also index this in Couchbase 4.5.0 and above.

CREATE INDEX idx1 ON giata_properties( DISTINCT ARRAY v.`value` FOR v WITHIN SELF.giata_properties.propertyCodes.provider[*].code END );

Edit to answer question edits

The performance has been addressed in 4.5.x. You should try the following on Couchbase 4.5.1 and post the execution times here.

  • Test on 4.5.1.
  • Create the index.
  • Use the LIMIT. In 4.5.1, the limit is pushed down to the index.
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks geraldss, that works and it's really nice that you don't have to specify which value field you're referring to in N1QL (does that impact performance by the way?). But can you clarify exactly how can I specify in N1QL which field it should be looking at? For example I want my query to only look at gp.propertyCodes.provider.code.value.value and not all fields with the name value. Also can I do the index you suggested on the community edition 4.1.0?
Array indexing is only available in 4.5 and above. I will edit the answer to be a bit more specific.
I wonder if this more specific query have better performance than the earlier more general one, I'll do benchmarks soon. Thanks a lot.
Posted some benchmarks and an update to my question, help would be very much appreciated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.