Production es help

Hi All,
I have 4 node prod cluster all are redat based servers , all are master and data eligible i have 24 GB Memory for each machine and given12 GB heap and another 12GB for server and lucene as it is recommended by Elastic Search.Now i am seeing heap is filling quickly and java not able to call gc also.
we are doing search operations bit heavy but heap space will recover by gc but it is not happening.

Now my questions are: using ES 1.5.2

  1. Do we need to set any parameters to recovery heap space quickly?
  2. Can I increase es heap size up to 15 GB and i will leave 9 GB remaining to server?
  3. please suggest your recommendations on this

Thanks ....

phani

The first thing you should do is upgrade.

Thank you mark i can upgrade but , any clue on why my es heap size increasing heavily even though we are not performing any operations. please suggest me.

i am seeing hot_threads running on one of my node:

:: [node3][8BZti1iGTf68soGQR78BWw][prod-nosql03][inet[/172.16.100.28:9300]]{master=true}
Hot threads at 2016-11-14T09:23:08.979Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

1.7% (8.5ms out of 500ms) cpu usage by thread 'RMI TCP Connection(39)-172.16.100.6'
 10/10 snapshots sharing following 16 elements
   java.net.SocketInputStream.socketRead0(Native Method)
   java.net.SocketInputStream.read(SocketInputStream.java:152)
   java.net.SocketInputStream.read(SocketInputStream.java:122)
   java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   java.io.BufferedInputStream.read(BufferedInputStream.java:254)
   java.io.FilterInputStream.read(FilterInputStream.java:83)
   sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:549)
   sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:828)
   sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.access$400(TCPTransport.java:619)
   sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:684)
   sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:681)
   java.security.AccessController.doPrivileged(Native Method)
   sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:681)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:745)

Thanks
phani

Hi @warkolm,

Can I increase node heap size up to 16 GB and see i have 24 GB of RAM to my machine i have allocated 12G but still heap is consumption is up to 99% and garbage collector also not calling may i know the reason please.
I left half of the RAM to machine but it is still not recovering the old gen. can i increase up to 16 GB for one of my node and this Increasing RAM size in my node 4 is fine or do i need to change in all four nodes in my prod cluster,

please suggest me

thanks,
phani

How much data do you have in the cluster? How many indices/shards? What kind of load, querying and indexing, is the cluster under?

Hi @Christian_Dahlqvist

Thank you . I have 1.5 billion docs on prod cluster it is of size 180GB . indices are 52 indices and size of each index is 4GB and some of them are 7 to 8 GB and remaining all are smaller indices. just we are querying es to get some data not performing heavy aggregation operations . Indexing we are pushing daily 1 million documents to it to various indices and i have nodes with all master eligible and all data nodes.

Thanks
phani.

Hi @Christian_Dahlqvist

Thank you . I have 1.5 billion docs on prod cluster it is of size 180GB . indices are 52 indices and size of each index is 4GB and some of them are 7 to 8 GB and remaining all are smaller indices. just we are querying es to get some data not performing heavy aggregation operations . Indexing we are pushing daily 1 million documents to it to various indices and i have nodes with all master eligible and all data nodes.

Thanks
phani.

That sounds reasonable. Do you have any non-default node configuration that could drive heap usage?

I have 5 shards and 1 Replica and i have derived heap properties based on es recommendations half of the machine ram i.e 12Gb each . did swapoff and not enabled bootstrpmalloc = true because we already disabled swaping. integrated JMX and defined the following property for cache size

indices.fielddata.cache.size: 5gb

and i thought to put the following properties as well will it suggest able.

indices.breaker.fielddata.limit: 60%
indices.breaker.request.limit: 40%
indices.breaker.total.limit : 70%

apart from these properties i have defined all are default properties and enabled CORS for my search APPs

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.