Twice in the last two months, the elasticsearch process has been killed by the OS oom-killer, so I've been monitoring things a bit more recently.
I'm running a single shard on a Raspberry Pi (so 1GB total, 4 cores, no swap, Raspbian Stretch). Currently ES 6.1.1. I allocate 512m to heap and max heap in jvm options, and it starts up with top showing 75.6% allocated to the ES Java process, which as you'd expect is by far the largest memory user. I know there is stack and code memory consumption, but it seems quite a lot to start with compared with the requested heap. However, the problem seems to be that the process grows at a pretty constant rate - about 0.6% of memory per day, so it inevitably crashes sooner or later. It's been running for 2 years or so quite happily in this configuration, and memory problems have only been fairly recent, so either it is to do with a recent version, or something I've changed in how I use it that is provoking this.
When it crashes it is usually during backup, I guess either because it has to touch every record or because it is a relatively long process, so anything else that starts while it is running just tips it over, so I don't think that is itself to blame.
I could, of course, pro-actively restart ES every week or so, say to preempt problems. However, it would be good to locate the cause of the problem. It has all the hallmarks of a memory leak.
Perhaps it might be better to set the heap memory lower so it runs out of heap before the OS runs out of memory, though I think the consequences would ultimately be the same. But as it presumably caches lots of stuff in memory, it makes sense to run at near memory capacity for performance reasons; and in any case, if it steadily adds 6MB or so a day, sooner or later it would run out anyway.
I'm using it perhaps a bit differently than for example logstash would. Logstash adds new indexes every day, and primarily adds new records but does little subsequent modification, while I'm more a conventional app where the indexes are fixed and the records within them routinely change from time to time. It's also quite a small database, and quite diverse data, so there are between 30 and 40 dissimilar indexes per database, with 5 instances of the app running, so around 200 indexes actively in use, though not terribly heavily.
What can I do to pin this down further? (I have another Pi set up almost identically which I could experiment more freely with than this production on if that helps).