Bulk get (_mget) performance when using ES as key value store

mjuric · January 4, 2019, 12:52pm

We are primarily using ElasticSearch for search index purposes, but we now have an internal use case where we are using ElasticSearch as a key value store through the bulk get API (_mget) to fetch a lot of documents at once using the _id field. However, we have our doubts about the suitability of ES for this due to the way Lucene stores data internally, so the fear is that it underneath just produces a lot of random lookups when throwing 1K keys at it in each bulk request. Somebody has probably asked something similar previously, but I could not find a clear cut answer to my question, so I would really appreciate if someone was able to clarify me about this. Thanks.

s1monw · January 11, 2019, 9:11am

we do a ton of primary key lookups during indexing. I think 1k keys should work just fine I guess. That's a gut-feeling though.

Slavisa_Djukic · January 30, 2019, 5:45pm

We've been experimenting with the same idea. Want to process roughly 10M records an hour. Records are not unique on what we're interested in so workflow is batch records, hit ES with mget, build CDC (change data capture) like structure and index only new records or records whose fields changed. Batch size is 5k. As index size grows mget performance seriously degrades. Seems like it's linear to the size of index.

Index at ~3M records and ~1GB:
5k ids mget ~0.2s

Index at about 130M records and ~40GB:
5k ids mget ~10s

p.s. This is 'default' setup. Single EC2 instance with attached EBS SSD. One index with 5 shards.
Just started with ES so any resource or idea is appreciated.

system · February 27, 2019, 5:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mget too slow for large amount of documents Elasticsearch	9	1801	February 16, 2022
Long search time with mget Elasticsearch	3	511	August 2, 2021
_mget vs _search for large amount of documents Elasticsearch	5	1175	August 25, 2023
Multiget (mget) API performance Elasticsearch	5	1930	December 21, 2021
Extracting fields in bulk - using ES as a data store Elasticsearch	4	553	July 6, 2017

Bulk get (_mget) performance when using ES as key value store

Related topics