0

Environment Setup

I'm working with a distributed JanusGraph architecture deployed on Azure Kubernetes Service (AKS):

Infrastructure:

  • AKS Cluster: 2 nodes (16 vCPU, 64 GB RAM each)
  • Cassandra: 2 replicas with sharding enabled (Kubernetes pods)
  • Elasticsearch: 2 replicas with sharding enabled (Kubernetes pods)
  • JanusGraph: Single replica connected to both backends (Kubernetes pod)
  • Mixed index: Created on title and nt property keys

Connection Pool Implementation:

I've implemented thread-safe connection pooling where each keyspace has its cached traversal:

class BaseGremlinClass(View):
    _connections = {}
    _lock = threading.Lock()

    def get_traversal(self, keyspace_name):
        """Get or create a traversal for the given keyspace"""
        if keyspace_name not in settings.JANUSGRAPH_KEYSPACES:
            raise ValueError(f"Keyspace {keyspace_name} not found in settings")

        with self._lock:
            if keyspace_name not in self._connections:
                self._create_connection(keyspace_name)
            print("Getting Connection from Pool")
            return self._connections[keyspace_name]['traversal']

    def _create_connection(self, keyspace_name):
        """Create a new connection and traversal"""
        try:
            config = settings.JANUSGRAPH_KEYSPACES[keyspace_name]
            connection = DriverRemoteConnection(
                config['url'],
                config['graph'],
                message_serializer=serializer.GraphSONSerializersV3d0(),
                timeout=30,
                pool_size=10,
                max_workers=4,
            )
            traversal_g = traversal().withRemote(connection)
            self._connections[keyspace_name] = {
                'connection': connection,
                'traversal': traversal_g
            }
            logger.info(f"Created connection for keyspace {keyspace_name}")
        except Exception as e:
            logger.error(f"Error creating connection to {keyspace_name}: {e}")
            raise

class GremlinQueries:
    def __init__(self, keyspace_name='main'):
        traversal = BaseGremlinClass()
        self.g = traversal.get_traversal(keyspace_name).with_('evaluationTimeout', 0)
        self.keyspace_name = keyspace_name

    def get_all_nodes_label(self):
        """Return list of node types"""
        data = self.g.V().has('nt').values('nt').dedup().order().by(Order.asc).to_list()
        return data

Performance Issue

Despite having connection pooling, indexing, and sharding implemented, I'm observing:

  1. First query execution: Takes significantly longer (e.g., 45 seconds)
  2. Second query: Runs in almost half the time (~20-25 seconds)
  3. Subsequent queries: Maintain the improved performance (i.e. maintains the performance of the 2nd run)

Questions

  1. Why does the first Gremlin query take significantly longer than subsequent runs in a Kubernetes environment, even with connection pooling and indexing?
  2. What Kubernetes-specific factors might be contributing to this cold start behavior?
  3. What optimisations can be implemented to reduce the first-time latency in a containerised distributed setup?
  4. Are there specific considerations for sharded Cassandra/Elasticsearch deployments on Kubernetes that could impact initial query performance?

What I've Tried

  • Verified that connection pooling is working (connections are reused)
  • Confirmed mixed indexes are properly created and being used
  • Checked that subsequent queries with same/different parameters show improved performance
  • Monitored that the connection pool prevents reconnection overhead
1
  • Both Cassandra and Janusgraph use a cache, see yaaics.blogspot.com/2018/04/…, so I would only expect such cold start behaviour. Commented Jun 7 at 11:44

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.