Running multiple nodes in development mode


(Sander Theetaert) #1

Dear all,

we use elastic for very few users (2 people that understand that the environment can be down),
so the setup doesn't require to meet production demands.

But we do have some limitations,

Our first node, that worked fine for quite some time, is doing excessive garbage collection for the present moment,

so I thought that adding another node would be a feasible solution.

Thinking it would be a walk in the park I encountered some, well, rain...

Apparently as soon as a second node comes in (on another host) elastic enforces production demands
(warning became errors).

Some reasons why we can't meet those demands:

  • (foremost) we don't have root on the machines...
  • the people that do have root are not keen on configuring the linux machines (we already are kind of the exception in a large organisation)

Other issues:

  • we don't have much memory left on the single node we had (222mb) so increasing memory will only postpone the problem
  • the data is on that node, maybe moving the data to a virtual machine that has more memory is a solution (but I guess it's also more postponing)
  • we do want to keep the data we have until now (and it will grow (the current solution worked for about 4 months))

Is there a way to enforce development mode in a cluster with multiple hosts or does anyone see an other (preferably long-term) solution?


(Daniel Mitterdorfer) #2

Hi @bamboomy,

unfortunately, it is not possible to circumvent the production mode checks. In general, the idea behind these checks is to help you to prevent common issues that could data loss. So the best option is to honor these settings and to convince the ops people it is necessary (although I understand that this is hard in your organization).

If you experience garbage collection issues, you can do one of three things:

  1. Tune the garbage collector
  2. Reduce memory pressure
  3. Increase the heap size (not an (easy) option for you)

I think, for the short term, the best option given your circumstances is option 2. I'd start by reviewing the mapping and eliminate any fields that are not strictly needed, disable the _all field, etc. You can check the How-to guides that are written to help you increase index and search speed and sometimes apply settings contrary to that advice (e.g. if the how-to speaks about increasing some cache you could try to reduce it). This will obviously be bad for performance but will help you to stay within memory limits.

The longer term solution IMHO is to:

  • Get more memory: Note that Elasticsearch and specifically Lucene also use native memory in order to keep index files in the operating system's file system cache. You should plan to have at least half of the memory available for the file system cache.
  • Have production checks pass: I understand that it is hard to convince the ops people in your organization but risking data loss due to wrong settings does not help either.

It would probably make sense to experiment e.g. on EC2 to determine your memory demands. The Webinar Using Rally to Get Your Elasticsearch Cluster Size Right has a few tips how to achieve that (note that it requires prior registration but is otherwise free to watch).

I hope that you have some pointers now how to proceed.

Daniel


(Sander Theetaert) #3

Hey @danielmitterdorfer,

thanks for your extended answer, I'm sure it will also be helpful to other people,

our solution was to: 2. Reduce memory pressure
-> in that sense that we don't need all of the data all of the time
----> we 'swap out' old data with snapshots on the filesystem so we can always 'swap' old data back 'in' when needed.

I posted the question in parallel to investigating options and see "who would win" and this blog post was indeed quite clear on the issue :wink:

I hope other people will be helped as well, surely by your post, maybe by mine as well,

Thanks again,

S.


(Daniel Mitterdorfer) #4

Hi @bamboomy,

reducing the number of indices is definitely an option and I'm glad that you've found a solution. Good that you've also mentioned the blog post here.

I think though in the longer term your organization has to make a decision whether it's cheaper to give your instance a bit more resources or that you as an employee are constantly shuffling indices / snapshots around. :slight_smile:

Daniel


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.