1

I want to create a neptune db, and dump data to it. I download historical data from DynamoDB to S3, these files in csv format. The header in these csv like:

~id, someproperties:String, ~label

Then, I need to implement real-time streaming to this neptune db through lambda, in the lambda function, I will check if one vertex(or edges) exist or not, if exist, I will update the vertex(or edges), otherwise I creat a new one. In python, my implementation like this:

g.V().hasLabel('Event').has(T.id, event['Id']).fold().coalesce(unfold(), addV('Event').property(T.id, event['Id'])).property(Cardinality.single, 'State', event['State']).property('sourceData', event['sourceData']).next()

Here I have some questions:

  1. In real-time streaming, I need to query if vertex with a id already there, so I need to query the nodes of historical data, so can has(T.id, event['Id']) do this? or should I just use has(id, event['Id']) or has("id", event['Id']) ?
  2. I was using g.V().has('Event', T.id, event['Id']) instead of g.V().hasLabel('Event').has(T.id, event['Id']), but got error like cannot local NeptuneGraphTraversal.has(). Are these two queries same thing?

1 Answer 1

2

Here's the three bits of Gremlin you had a question about:

g.V().has(T.id, "some-id")
g.V().has(id, "some-id")
g.V().has("id", "some-id")

The first two will return you the same result as id is a member of T (as a point of style, Gremlin users typically statically import id so that it can be referenced that way for brevity). The last traversal is different from the first two because, as a String value it refers to a standard property key named "id". Generally speaking, TinkerPop would recommend that you not use a property key name like "id" or "label" as it can lead to mistakes and confusion with values of T.

As for the second part of your question revolving around:

g.V().has('Event', T.id, event['Id']) 
g.V().hasLabel('Event').has(T.id, event['Id'])

You can't pass T.id to the 3-ary form of has() as Kelvin points out as the step signature only allows a String in that second position. It also wouldn't make sense to allow T there because T.label is already accounted for by the first argument and T.id refers to the actual graph element identifier. If you know that value then you wouldn't bother specifying the T.label in the first place, as the T.id already uniquely identifies the element. You would just do g.V(event['Id']).

Sign up to request clarification or add additional context in comments.

2 Comments

I didn't think in 3-ary form the second parameter could be anything but a String key value. It fails for me with TinkerGraph also.
ugh....see, this is why you don't overload T.id with a property key called "id". just leads to all kind of misunderstanding. fixed my answer. thanks kelvin

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.