Search source code and also use as store for them at the same time

No problem @ice3man543!

These two issues are a function of how much memory your nodes have available in addition to the configured JVM heap size. There is a bit of background on this here but what it boils down to is this:
The document source gets loaded from disk. Disk is fast if you have enough RAM so that the file system cache is used relatively often compared to physical disk reads and so are searches that need to load things from disk.
=> The more RAM you have and the faster your disks the less of an issue this is so as long as you size your nodes accordingly this should be fine.

The number of documents is not an issue. The only limit you have to keep in mind here is that you can only have ~2B (32 bit signed int max) documents per shard because Lucene uses int document ids. So you have to keep this number in mind when deciding on how many shards to use per index and make that number large enough but that's it.

Without having more quantitative details here I would say you're most likely good to just have all those workers work independently. The important thing to look at here is the number of documents you will be indexing in a single bulk request. Try to make the individual workers send bulk requests of multiple documents if possible but unless we're talking about an extreme case here of thousands of workers or so this should not be an issue. In your case in particular, the bulk size should probably be chosen somewhat on the smaller end of things because of the expected slightly larger document size, so manually batching seems even less useful in your case.
=> it's pretty unlikely that this will be an issue I think :slight_smile:

1 Like