I have a simple table in ClickHouse with 10M rows, defined like this:
CREATE TABLE data.syslogs
(
`id` UInt32,
`time` DateTime,
`priority` UInt8,
`message` String,
INDEX ngrambf_message_index message TYPE ngrambf_v1(5, 65536, 3, 37) GRANULARITY 256
)
ENGINE = MergeTree
PRIMARY KEY (time, id)
ORDER BY (time, id, priority)
SETTINGS index_granularity = 8192
I'm running a match query over it like so:
SELECT message FROM syslogs WHERE match(syslogs.message, 'stat');
My issue is that this seems to be skipping no data at all. Here is the output:
3900390 rows in set. Elapsed: 6.016 sec. Processed 10.00 million rows, 777.80 MB (1.66 million rows/s., 129.29 MB/s.) Peak memory usage: 21.30 MiB.
Here are some example rows:
9996. │ 17069 │ 2024-08-20 08:40:25 │ 13 │ statusd: something event: station count: 5 │ 9997. │ 17069 │ 2024-08-20 08:40:25 │ 13 │ statusd: something event: station count: 5 │
It seems like the index doesn't have any effect, I would expected it to skip some rows at least even if its defined badly.
Any ideas as to why?
Thanks!