Unable to start Elasticsearch instance. It says Out of Memory error

Hi All,

I am facing a serious error in my elasticsearch instance. Before I start explaining what is the error. I want to give a short info about current ecosystem i am playing with.

So we have a test server(LINUX) with 3 GB of RAM available with 2019 MB almost always freely available. And 1999 MB of Swap memory with 1383 MB freely available.

I have 2 instances of Elasticsearch. Both are version 6.2.4. The only difference is one is secured with Search Guard and other is not TLS secured. Now until recently for some reason the unsecured version( usually used for experiment and a lot of useless sample indexes) is unable to start. And whenever I try to start it, it produces "hs_err_pid12345.log". The log says that there is insufficient memory for JAVA RUNTIME ENV. Also ES logs keep on saying GS Allocation Failed.

Since its quite long, the content of "hs_err_pid12345.log" is at the end..

The other secured version used o work fine. But recently it has started to go down automatically. I am assuming it is facing similar issue?

Now I have tried various methods to resolve it.

  1. I tried allocating 2 GB of Max and Min heap in jvm.options(didn't work)
  2. I read somewhere that above only makes difference while running ES for first time, so I created env variable as mentioned in documentation.(didn't work)
  3. I tried shutting down ES secured version, maybe it was consuming some memory(didn't work).
  4. Deleted unnecessary Physical/ROM memory drive and changes it from 1.8 to 2.3 GB(didn't work)

Finally I wanted to delete unnecessary indices from unsecured ES. But i cannot put it live even once to do that.

Finally, what are my options?
How can I make it work?

There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 4207738880 bytes for committing reserved memory.
Possible reasons:
The system is out of physical RAM or swap space
In 32 bit mode, the process size limit was hit
Possible solutions:
Reduce memory load on the system
Increase physical memory or swap space
Check if swap backing store is full
Use 64 bit Java on a 64 bit OS
Decrease Java heap size (-Xmx/-Xms)
Decrease number of Java threads
Decrease Java thread stack sizes (-Xss)
Set larger code cache with -XX:ReservedCodeCacheSize=
This output file may be truncated or incomplete.

Out of Memory Error (os_linux.cpp:2640), pid=27501, tid=0x00007f40da7b1700

JRE version: (8.0_161-b12) (build )
Java VM: Java HotSpot(TM) 64-Bit Server VM (25.161-b12 mixed mode linux-amd64 compressed oops)
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

--------------- T H R E A D ---------------

--------------- P R O C E S S ---------------

Java Threads: ( => current thread )

Other Threads:

=>0x00007f40d400b800 (exited) JavaThread "Unknown thread" [_thread_in_vm, id=27539, stack(0x00007f40da6b2000,0x00007f40da7b2000)]

VM state:not at safepoint (not fully initialized)

VM Mutex/Monitor currently owned by a thread: None

GC Heap History (0 events):
No events

Deoptimization events (0 events):
No events

Classes redefined (0 events):
No events

Internal exceptions (0 events):
No events

Events (0 events):
No events

Environment Variables:
JAVA_HOME=/abcd/username/jdk1.8.0_161/

--------------- S Y S T E M ---------------

OS:Red Hat Enterprise Linux Server release 7.6 (Maipo)

uname:Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Fri Apr 19 21:09:07 UTC 2019 x86_64
libc:glibc 2.17 NPTL 2.17
rlimit: STACK 8192k, CORE 0k, NPROC 65536, NOFILE 65536, AS infinity
load average:0.08 0.04 0.05

/proc/meminfo:
MemTotal: 3880944 kB
MemFree: 105132 kB
MemAvailable: 190676 kB
Buffers: 0 kB
Cached: 269236 kB
SwapCached: 30552 kB
Active: 2565312 kB
Inactive: 1008092 kB
Active(anon): 2433264 kB
Inactive(anon): 875208 kB
Active(file): 132048 kB
Inactive(file): 132884 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 2047996 kB
SwapFree: 1627132 kB
Dirty: 80 kB
Writeback: 0 kB
AnonPages: 3279516 kB
Mapped: 43808 kB
Shmem: 4300 kB
Slab: 84012 kB
SReclaimable: 50760 kB
SUnreclaim: 33252 kB
KernelStack: 9152 kB
PageTables: 16032 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3988468 kB
Committed_AS: 5002496 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 163068 kB
VmallocChunk: 34359341052 kB
HardwareCorrupted: 0 kB
AnonHugePages: 585728 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 75712 kB
DirectMap2M: 3069952 kB
DirectMap1G: 3145728 kB

CPU:total 1 (initial active 1) (1 cores per cpu, 1 threads per core) family 6 model 85 stepping 4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, aes, clmul, erms, 3dnowpref, tsc, tscinvbit

/proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
stepping : 4
microcode : 0x200004d
cpu MHz : 2693.671
cache size : 33792 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm 3dnowprefetch ssbd ibrs ibpb stibp fsgsbase smep arat spec_ctrl intel_stibp flush_l1d arch_capabilities
bogomips : 5387.34
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

Memory: 4k page, physical 3880944k(105132k free), swap 2047996k(1627132k free)

vm_info: Java HotSpot(TM) 64-Bit Server VM (25.161-b12) for linux-amd64 JRE (1.8.0_161-b12), built on Dec 19 2017 16:12:43 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)

time: Fri Aug 30 11:17:30 2019
elapsed time: 0 seconds (0d 0h 0m 0s)

Hi @groverjatin17,

it is not clear to me whether you have both instances running or not? I mean, when you start the instance that cannot start, is the other instance then running? If so, shutting it down first should cure the issue.

If you have trouble shutting it down, you can kill it instead. Check using ps that the process is gone.

Also check for other processes on the node occupying memory.

Finally, you can look into the overcommit kernel option settings, they may be setup too strictly.

Thanks for your reply.

Since I want to make unsecured ES instance work. The other instance(secured ES instance ) is closed. Yes, Killed the process.

Although I would like to mention that for 4 months both the instances were running parallely and did not give any error.

Do you recommend any other way I could run the instance atleast once, So that once it starts and i could delete unwanted indices to release some memory?

Hi Henning,

I would also like to mention that I also get a lot of GC (Allocation Failure) in the logs when I start a new instance. Do you think that something needs to be fixed outside ES config?

LOGS:

Java HotSpot(TM) 64-Bit Server VM (25.161-b12) for linux-amd64 JRE (1.8.0_161-b12), built on Dec 19 2017 16:12:43 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 3880944k(1720364k free), swap 2047996k(1415988k free)
CommandLine flags: -XX:+AlwaysPreTouch -XX:CMSInitiatingOccupancyFraction=75 -XX:GCLogFileSize=67108864 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=2147483648 -XX:MaxHeapSize=2147483648 -XX:MaxNewSize=87244800 -XX:MaxTenuringThreshold=6 -XX:NewSize=87244800 -XX:NumberOfGCLogFiles=32 -XX:OldPLABSize=16 -XX:OldSize=174489600 -XX:-OmitStackTraceInFastThrow -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:ThreadStackSize=1024 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC 
2019-09-03T09:09:31.328+0000: 3.553: Total time for which application threads were stopped: 0.0001282 seconds, Stopping threads took: 0.0000166 seconds
2019-09-03T09:09:31.780+0000: 4.005: Total time for which application threads were stopped: 0.0001430 seconds, Stopping threads took: 0.0000159 seconds
2019-09-03T09:09:31.854+0000: 4.079: Total time for which application threads were stopped: 0.0001492 seconds, Stopping threads took: 0.0000193 seconds
2019-09-03T09:09:31.873+0000: 4.098: Total time for which application threads were stopped: 0.0001318 seconds, Stopping threads took: 0.0000171 seconds
2019-09-03T09:09:32.149+0000: 4.374: Total time for which application threads were stopped: 0.0001827 seconds, Stopping threads took: 0.0000167 seconds
2019-09-03T09:09:32.303+0000: 4.528: Total time for which application threads were stopped: 0.0002064 seconds, Stopping threads took: 0.0000158 seconds
2019-09-03T09:09:32.318+0000: 4.542: Total time for which application threads were stopped: 0.0001554 seconds, Stopping threads took: 0.0000138 seconds
2019-09-03T09:09:32.409+0000: 4.633: [GC (Allocation Failure) 2019-09-03T09:09:32.409+0000: 4.633: [ParNew
Desired survivor size 4358144 bytes, new threshold 1 (max 6)
- age   1:    8520240 bytes,    8520240 total
: 68160K->8344K(76672K), 0.0242991 secs] 68160K->8344K(2088640K), 0.0243988 secs] [Times: user=0.02 sys=0.00, real=0.03 secs] 
2019-09-03T09:09:32.433+0000: 4.658: Total time for which application threads were stopped: 0.0245461 seconds, Stopping threads took: 0.0000158 seconds
2019-09-03T09:09:32.603+0000: 4.828: Total time for which application threads were stopped: 0.0002043 seconds, Stopping threads took: 0.0000170 seconds
2019-09-03T09:09:32.813+0000: 5.038: Total time for which application threads were stopped: 0.0002599 seconds, Stopping threads took: 0.0000181 seconds
2019-09-03T09:09:33.422+0000: 5.647: [GC (Allocation Failure) 2019-09-03T09:09:33.422+0000: 5.647: [ParNew
Desired survivor size 4358144 bytes, new threshold 1 (max 6)
- age   1:    5310088 bytes,    5310088 total
: 76504K->5800K(76672K), 0.0356554 secs] 76504K->8500K(2088640K), 0.0357362 secs] [Times: user=0.04 sys=0.00, real=0.03 secs] 
2019-09-03T09:09:33.458+0000: 5.682: Total time for which application threads were stopped: 0.0359225 seconds, Stopping threads took: 0.0000175 seconds
2019-09-03T09:09:33.792+0000: 6.016: [GC (Allocation Failure) 2019-09-03T09:09:33.792+0000: 6.016: [ParNew
Desired survivor size 4358144 bytes, new threshold 6 (max 6)
- age   1:    2432864 bytes,    2432864 total
: 73960K->4226K(76672K), 0.0211403 secs] 76660K->7807K(2088640K), 0.0212132 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
2019-09-03T09:09:33.813+0000: 6.038: Total time for which application threads were stopped: 0.0213812 seconds, Stopping threads took: 0.0000172 seconds
2019-09-03T09:09:34.216+0000: 6.440: [GC (Allocation Failure) 2019-09-03T09:09:34.216+0000: 6.440: [ParNew
Desired survivor size 4358144 bytes, new threshold 6 (max 6)
- age   1:    3093736 bytes,    3093736 total
- age   2:     166024 bytes,    3259760 total
: 72386K->3512K(76672K), 0.0186490 secs] 75967K->7093K(2088640K), 0.0187124 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
2019-09-03T09:09:34.235+0000: 6.459: Total time for which application threads were stopped: 0.0188732 seconds, Stopping threads took: 0.0000174 seconds
2019-09-03T09:09:34.603+0000: 6.828: [GC (Allocation Failure) 2019-09-03T09:09:34.603+0000: 6.828: [ParNew
Desired survivor size 4358144 bytes, new threshold 6 (max 6)
- age   1:    2145232 bytes,    2145232 total
- age   2:     258056 bytes,    2403288 total

HI @groverjatin17,

the "GC (Allocation Failure)" messages are normal. It is simply running out of space in "eden" and doing a new generation GC to make room for more. I believe you are looking into the GC log file to see this information, I recommend reading about the various messages (plenty of info available).

Did you try lowering the heap size to like 1GB or 1.5GB? If not, that is worth a try.

Also, it could be worth checking /proc/meminfo while ES is not running, just to verify that enough memory is available (and not occupied by some other process).

Additionally, checking your limits with ulimit or looking into /etc/security/limits.conf (location may depend on linux flavor) is advisable.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.