Search source code and also use as store for them at the same time

Armin_Braun · February 14, 2020, 1:53pm

These two issues are a function of how much memory your nodes have available in addition to the configured JVM heap size. There is a bit of background on this here but what it boils down to is this:
The document source gets loaded from disk. Disk is fast if you have enough RAM so that the file system cache is used relatively often compared to physical disk reads and so are searches that need to load things from disk.
=> The more RAM you have and the faster your disks the less of an issue this is so as long as you size your nodes accordingly this should be fine.

The number of documents is not an issue. The only limit you have to keep in mind here is that you can only have ~2B (32 bit signed int max) documents per shard because Lucene uses int document ids. So you have to keep this number in mind when deciding on how many shards to use per index and make that number large enough but that's it.

Without having more quantitative details here I would say you're most likely good to just have all those workers work independently. The important thing to look at here is the number of documents you will be indexing in a single bulk request. Try to make the individual workers send bulk requests of multiple documents if possible but unless we're talking about an extreme case here of thousands of workers or so this should not be an issue. In your case in particular, the bulk size should probably be chosen somewhat on the smaller end of things because of the expected slightly larger document size, so manually batching seems even less useful in your case.
=> it's pretty unlikely that this will be an issue I think

Topic		Replies	Views
Issue while searching documents Elasticsearch	1	290	July 6, 2017
High Scale Use Case Elasticsearch	6	550	January 18, 2017
Recommendation for indexing a large size document < 1G Elasticsearch	4	5961	July 5, 2017
Questions relating to elastic search Elasticsearch	3	974	July 6, 2017
Is it possible : Stream 50k-100k document < 1sec? Elasticsearch	7	2201	September 17, 2019

Search source code and also use as store for them at the same time

Related topics