Min RAM during dev for large dataset

So I've really got two questions here.

  1. If I have about 100GB of documents that I want to make searchable with elasticsearch, is it bad to just stick it in a single node / shard? ( I can figure out replicas later when we start looking at production)

  2. Also, how much RAM do I NEED? Is it possible to run this ES instance on a machine with only 8gb ram or something (just during development) and just have it run slower, or do I need to need shell out now for a system with more memory? I can always scale up to optimal RAM parameters later on.

My use case is that I am prototyping a system and need to get our full document set indexed so we can compare it apples to apples in usability testing against the existing system. Performance isn't huge right now. My dev machine is just a i7 ultrabook with 8gb of ram, and for the first, smaller version of the prototype that only had about 30mb of documents, my machine was just fine. Is it even possible for me to use this machine for dev with the next version of the prototype or do I need to shell out now for a more powerful machine?

I would like to comment from my experience,

If you only have one node, I think there is not much of a point of discussing numbers of primary and replica shards. Starting off with default setting is okay. You may want to set 1 primary shard / node if you want to measure how much resource elasticsearch uses per shard.

Also, how much RAM do I NEED? Is it possible to run this ES instance on a machine with only 8gb ram or something (just during development) and just have it run slower, or do I need to need shell out now for a system with more memory? I can always scale up to optimal RAM parameters later on.

elasticsearch is shipped with 2 GB of heap memory as default setting. (Below is from v5.4.0)

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms2g
-Xmx2g

Which means , at least you will need,

  1. 2 GB of memory for elasticsearch heap memory
  2. X GB for your OS
  3. elasticsearch will also use os cache for indexing and query cache.

Amount of memory and storage size which elasticsearch uses will depend on how your will index fields.

If I were you , I will start with heap memory size of 4GB (read this) , prepare a storage size of 300GB< if you can afford (More than 3 times the size of your raw data).

As you index the documents , watch how your index size and memory size grows.

Hope it helps.

Thanks Yu, this is helpful. I'll definitely be giving this a try then. I wasn't sure if I should even bother or if my computer would just crash from the load or something. I will follow up and post an update once I've tried it out.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.