Control Script
This is a legacy Apache Ignite documentation
The new documentation is hosted here: https://ignite.apache.org/docs/latest/
Apache Ignite provides ./control.sh command line script that allows to monitor and control a cluster's state. It can be found under /bin/ folder of an Apache Ignite directory.
Activation, Deactivation and Topology Management
The first thing that ./control.sh is capable of is cluster activation/deactivation and management of a set of nodes that represent the baseline topology. Refer to that page for more details.
Deactivation deallocates all memory resources, including your application data, on all cluster nodes and disables public cluster API. If you have in-memory caches that are not backed up by a persistent storage, you will lose the data and will have to repopulate these caches.
Cache State Monitoring
./control.sh provides some commands starting with --cache list for the purpose of caches monitoring. The command allows seeing a list of deployed caches with their affinity parameters and distribution within cache groups. There is also a command for viewing existing atomic sequences.
# Displays list of all caches with affinity parameters.
./control.sh --cache list .*
# Displays list of caches with affinity parameters which names start with "account-".
./control.sh --cache list account-.*
# Displays info about cache groups distribution of all caches.
./control.sh --cache list .* groups
# Displays info about cache groups distribution of caches which names start with "account-".
./control.sh --cache list account-.* groups
# Displays info about all atomic sequences.
./control.sh --cache list .* seq
# Displays info about atomic sequnces which names start with "counter-".
./control.sh --cache list counter-.*
Contention Detection in Transactions
contention command allows seeing multiple transactions that are contending in their attempt a lock for the same key. The command is useful if you have long-running or hanging transactions. Example:
# Reports all keys that are point of contention for at least 5 transactions on all cluster nodes.
./control.sh --cache contention 5
# Reports all keys that are point of contention for at least 5 transactions on specific server node.
./control.sh --cache contention 5 f2ea-5f56-11e8-9c2d-fa7a
If there are any highly contended keys, the utlity will dump extensive information including the keys, transactions, and nodes where the contention took place. Example:
[node=TcpDiscoveryNode [id=d9620450-eefa-4ab6-a821-644098f00001, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1527169443913, loc=false, ver=2.5.0#20180518-sha1:02c9b2de, isClient=false]]
// No contention on node d9620450-eefa-4ab6-a821-644098f00001.
[node=TcpDiscoveryNode [id=03379796-df31-4dbd-80e5-09cef5000000, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1527169443913, loc=false, ver=2.5.0#20180518-sha1:02c9b2de, isClient=false]]
TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=CREATE, val=UserCacheObjectImpl [val=0, hasValBytes=false], tx=GridNearTxLocal[xid=e9754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439646, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1247], other=[]]
TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=8a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439656, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=6a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439654, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=7a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439655, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=4a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439652, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
// Node 03379796-df31-4dbd-80e5-09cef5000000 is place for contention on key KeyCacheObjectImpl [part=0, val=0, hasValBytes=false].
Consistency Check Commands
The script goes with a set of commands that allow verifying internal data consistency invariants.
First, the commands can be used for debugging and troubleshooting purposes especially if you're under the active development stage.
Second, if there is a suspicion that a query such as an SQL one returns an incomplete or wrong result set, the commands can verify whether the data inconsistency really takes place or not.
Finally, the consistency check commands can be utilized as part of regular cluster health monitoring.
Let's review these usage scenarios in more details.
Verification of Partition Checksums
Even if update counters and size are equal among primary and backup nodes, there might be a case when a primary and backup diverges due to some critical failure. idle_verify command in ./control.sh command utility calculates and compares partition hashes among the whole cluster and reports if they are different. It's possible to specify a list of caches that require verification, as follows:
# Checks partitions of all caches that their partitions actually contain same data.
./control.sh --cache idle_verify
# Checks partitions of specific caches that their partitions actually contain same data.
./control.sh --cache idle_verify cache1,cache2,cache3
If any partitions diverge, a list of conflict partitions will be printed out, as follows:
idle_verify check has finished, found 2 conflict partitions.
Conflict partition: PartitionKey [grpId=1544803905, grpName=default, partId=5]
Partition instances: [PartitionHashRecord [isPrimary=true, partHash=97506054, updateCntr=3, size=3, consistentId=bltTest1], PartitionHashRecord [isPrimary=false, partHash=65957380, updateCntr=3, size=2, consistentId=bltTest0]]
Conflict partition: PartitionKey [grpId=1544803905, grpName=default, partId=6]
Partition instances: [PartitionHashRecord [isPrimary=true, partHash=97595430, updateCntr=3, size=3, consistentId=bltTest1], PartitionHashRecord [isPrimary=false, partHash=66016964, updateCntr=3, size=2, consistentId=bltTest0]]
Cluster should be idle during idle_verify check
All updates should be stopped when idle_verify calculates hashes, otherwise it may show false positive error results. It's impossible to compare big datasets in a distributed system if they are being constantly updated.
SQL Indexes Consistency Validation
validate_indexes command allows validating indexes of given caches on all cluster nodes locally.
The following is checked by the validation process:
- All the key-value entries that are referenced from a primary index has to be reachable from secondary SQL indexes as well if any.
- All the key-value entries that are referenced from a primary index has to be reachable. A reference from the primary index shouldn't go in nowhere.
- All the key-value entries that are referenced from secondary SQL indexes have to be reachable from the primary index as well.
# Checks indexes of all caches on all cluster nodes.
./control.sh --cache validate_indexes
# Checks indexes of specific caches on all cluster nodes.
./control.sh --cache validate_indexes cache1,cache2
# Checks indexes of specific caches on node with given node ID.
./control.sh --cache validate_indexes cache1,cache2 f2ea-5f56-11e8-9c2d-fa7a
If indexes refer to non-existing entries (or some entries are not indexed), errors will be dumped to the output, as follows:
PartitionKey [grpId=-528791027, grpName=persons-cache-vi, partId=0] ValidateIndexesPartitionResult [updateCntr=313, size=313, isPrimary=true, consistentId=bltTest0]
IndexValidationIssue [key=0, cacheName=persons-cache-vi, idxName=_key_PK], class org.apache.ignite.IgniteCheckedException: Key is present in CacheDataTree, but can't be found in SQL index.
IndexValidationIssue [key=0, cacheName=persons-cache-vi, idxName=PERSON_ORGID_ASC_IDX], class org.apache.ignite.IgniteCheckedException: Key is present in CacheDataTree, but can't be found in SQL index.
validate_indexes has finished with errors (listed above).
Cluster should be idle during validate_indexes check
Like idle_verify, indexes validation tool works correctly only if updates are stopped. Otherwise, there may be a race between checker thread and a thread that updates entry/index, which will result in false positive error report.
Updated almost 5 years ago
