Context:
I have an Elasticsearch 7.17.10 running on a Docker machine.
This Elasticsearch cluster has two nodes.
We are ingesting DNS traffic using Packetbeat.
We rotate indices daily: for example, dns-2025-02-20, dns-2025-02-21, etc.
However, we are concerned about memory management and shards/nodes, so we are testing ILM policies with data streams.
We conducted a test:
One Packetbeat sends packets to a data stream index.
Another Packetbeat sends packets to our regular indices.
We ran a script that executes the following query 10 times on each type of index (10 iterations for the regular index and 10 iterations for the data stream):
QUERY='{
"query": {
"range": {
"@timestamp": {
"gte": "2025-02-16T00:00:00Z",
"lte": "2025-02-20T23:59:59Z"
}
}
}
}'
(Considering that the first request is not cached, but subsequent ones are.)
For regular indices, with every 10,000 documents, we get an average response time of 850ms.
For data stream indices, with every 10,000 documents, we get an average response time of 3000ms.
Is this possible? Are we missing something?
It seems that data stream indices are much slower compared to regular indices.
No, the mappings are different, although there is no special difference between them.
Do the index mappings affect the query speed that much?
When I am not filtering by any field only @timestamp, which has exactly the same mapping in one index and in another?
Thanks for clear answers, you'd be surprised how unusual that is!!
Obviously it would depend on the volume and specifics of the differences. In your case, assuming the "no special differences", probably not. But differences in the settings, e.g. number of shards, replicas, source settings, ... it could have more significant effect.
Also:
Just to be clear, I meant for the specific indices being tested/compared. Reason is, if one of the indices is being indexed/queried/used at same time as you are effectively benchmarking, you would distort the results in a difficult to measure way. Might be at the margins, might make a significant difference.
Should'nt be significant differences cause the enviroment where we are doing this test, is a "dev" enviroment we use sometimes to get some querys etc This indices are not being used normaly for index/query or whatever
Maybe it's some kind of parameter in configuration of datastream (ILM, index_template) that im missing?
Very definitely yes, at least in some cases. It's like asking whether changing the indices in a SQL database might affect query performance.
From the point of view of a search there's no difference between the indices that make up a datastream and any other indices, so the performance drop is coming from some other difference between your two setups.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.