ElasticSearch index size decreases while docs.count increases

Question

I noticed a strange behavior in ElasticSearch (version 5.5.0) where store.size decreased while docs.count increased. Why does this happen?

$ curl 'localhost:9201/_cat/indices/index-name:2017-08-08?bytes=b&v'
health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index-name:2017-08-08 PlpLYu5vTN-HFA_ygHUNwg  17   1    5577181       212434 3827072602     1939889776

$ curl 'localhost:9201/_cat/indices/index-name:2017-08-08?bytes=b&v'
health status index                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index-name:2017-08-08 PlpLYu5vTN-HFA_ygHUNwg  17   1    5581202       204815 3812410150     1927833617

Note that while docs.count increased from 5577181->5581202, both store.size and pri.store.size decreased.

For background, I'm trying to use index size to throttle data going into ES (i.e. xGB per day). However, what I notice is that as I continue indexing, the index size decreases periodically (every hour or minutes or so). This is then not a good way to throttle since the storage size isn't strictly increasing

1) Any idea why the index size decreases? 2) Is there another size I should use which is strictly increasing?

EDIT: Actually even when there are no deleted documents the doc count still decreases. See below

$ curl -s localhost:9200/_cat/indices | grep name green open index-name:2017-08-11
eIGiDgeZQ5CqSu3tAaLRgw 1 1 111717 0 210.4mb 109.5mb $ curl -s localhost:9200/_cat/indices | grep name green open index-name:2017-08-11
eIGiDgeZQ5CqSu3tAaLRgw 1 1 132329 0 204.7mb 103.2mb

iggypop · Accepted Answer · 2020-09-19 08:41:21Z

2

Elasticsearch cluster compresses indices over time - thus the _stats api operation may show index size shrinking (until it stops). Index maybe compressed even by 40% for similar docs.

EDIT: as mentioned above, the under the hood segment merge happens over time as long as docs are indexed. After each segment merge it appears (vaguely) that a compression happens on the new segment so assuming ES compression algo is a Linear Transformation then compress(A) + compress(B) >= compress(A+B) means that index size may decrease in size.

edited Sep 19, 2020 at 8:41

answered Mar 29, 2020 at 9:55

iggypop

214 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Val · Accepted Answer · 2017-08-08 05:06:14Z

1

So you have 4021 additional documents (=5581202-5577181) but you can also notice that the count of deleted documents docs.deleted decreased as well by 7619 documents (=212434-204815) so the net count of documents in your index is -3598. This is due to Lucene merging segments under the hood in order to clean up the deleted documents and try to regain some unused space.

That's the most probable reason why the overall index size decreased by 14662452 bytes (~14 MB)

If you want to throttle, you can use the docs.count instead, if you're constantly indexing, that number should increase.

answered Aug 8, 2017 at 5:06

Val

218k14 gold badges377 silver badges384 bronze badges

14 Comments

hankduan Over a year ago

Do you happen to know why there would be deleted documents then? I'm only indexing.

Val Over a year ago

If you index a document a second time (i.e. with the same ID), the current document is marked as deleted and not replaced. Deleted documents are not necessarily ones that you have deleted via HTTP DELETE, but all older versions of existing documents. That's why Lucene regularly cleans up the index by merging segments and removing deleted documents.

hankduan Over a year ago

Hmm I don't expect any updates since I should be indexing new docuemnts all the time, but I might have to go revisit my code to make sure. Thanks!

Val Over a year ago

But 17 primary shards for your daily index of 3GB sounds like way more than would be necessary. A single shard is capable of holding several GB of data. Granted, I don't know your use case, though.

hankduan Over a year ago

that's because the day has only started. The index will end up with roughly 17*25GB since I allocate 26GB to my data node

|

Collectives™ on Stack Overflow

ElasticSearch index size decreases while docs.count increases

2 Answers 2

Comments

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related