Missing OS statistics on data nodes


(Alexander Vassilevski) #1

Hey guys,

I'm sorry if this is a duplicate post - I did in fact do various searches in the group prior to writing this, but I'd be more than glad to use a URL if someone has already solved this exact problem.

Here's a short description of the problem:

I have a cluster with 5 data nodes ( all eligible masters, with minimum 3 ). I also have a no data no master node which I use for load balancing search and indexing requests ( its the REST endpoint ).

The problem is that for data nodes 4 and 5 I am getting no OS statistics ( the ones provided by the sigar lib ).

The only difference between nodes 1-3 and 4,5 are that the last two were installed via puppet.
There are no significant ( meaning nothing more nothing less, just different values for IP's and hostnames) elasticsearch.yml config differences and I haven't set anything unique on the last two nodes via cluster permanent/transient, node or index settings.
I have tried restarting both all the individual nodes one by one and the cluster as a whole and neither of these resolved the issue.

I am willing to provide config files, program output, anything, etc, so that we can get this issue resolved - just let me know!

The elasticsearch version I'm using on all the nodes is 1.5.0, java is oracle 1.7.0_75.

-Alex V


(Antonio Bonuccelli) #2

can you check if any differences between kernel versions between the nodes?
sigar is not supporting newest kernels.


(Alexander Vassilevski) #3

Actually, there are some differences between the kernel versions ( they are all CentOS 7 machines):

On the working nodes, the kernel version is like this:

3.10.0-123.20.1.el7.x86_64

On the broken nodes, the kernel version is:

3.10.0-123.el7.x86_64

Can you point me to the source you are referring to?
Is there a version of sigar lib jar which is immune to this problem?


(Alexander Vassilevski) #4

Actually - your post got me thinking and I saw this page:
https://support.hyperic.com/display/SIGAR/Home

After testing out:

[root@esdata-003 lib]# java -Djava.library.path='/usr/share/elasticsearch/lib/sigar/' -jar sigar/sigar-1.6.4.jar 
sigar> 

on the working node and

[root@esdata-004 sigar]# java -Djava.library.path='/usr/share/elasticsearch/lib/sigar/' -jar sigar-1.6.4.jar 
no libsigar-amd64-linux.so in java.library.path
org.hyperic.sigar.SigarException: no libsigar-amd64-linux.so in java.library.path
        at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172)
        at org.hyperic.sigar.Sigar.<clinit>(Sigar.java:100)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:191)
        at org.hyperic.sigar.SigarLoader.class$(SigarLoader.java:77)
        at org.hyperic.sigar.SigarLoader.getLocation(SigarLoader.java:77)
        at org.hyperic.sigar.cmd.Runner.main(Runner.java:176)
java.lang.UnsatisfiedLinkError: org.hyperic.sigar.util.Getline.isatty()Z
        at org.hyperic.sigar.util.Getline.isatty(Native Method)
        at org.hyperic.sigar.util.Getline.<clinit>(Getline.java:34)
        at org.hyperic.sigar.shell.ShellBase.init(ShellBase.java:91)
        at org.hyperic.sigar.cmd.    [root@esdata-004 sigar]# java -Djava.library.path='/usr/share/elasticsearch/lib/sigar/' -jar sigar-1.6.4.jar 
no libsigar-amd64-linux.so in java.library.path
org.hyperic.sigar.SigarException: no libsigar-amd64-linux.so in java.library.path
        at org.hyperic.sigar.Sigar.loadLibrary(Sigar.java:172)
        at org.hyperic.sigar.Sigar.<clinit>(Sigar.java:100)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:191)
        at org.hyperic.sigar.SigarLoader.class$(SigarLoader.java:77)
        at org.hyperic.sigar.SigarLoader.getLocation(SigarLoader.java:77)
        at org.hyperic.sigar.cmd.Runner.main(Runner.java:176)
java.lang.UnsatisfiedLinkError: org.hyperic.sigar.util.Getline.isatty()Z
        at org.hyperic.sigar.util.Getline.isatty(Native Method)
        at org.hyperic.sigar.util.Getline.<clinit>(Getline.java:34)
        at org.hyperic.sigar.shell.ShellBase.init(ShellBase.java:91)
        at org.hyperic.sigar.cmd.Shell.main(Shell.java:225)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.hyperic.sigar.cmd.Runner.main(Runner.java:236)Shell.main(Shell.java:225)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.hyperic.sigar.cmd.Runner.main(Runner.java:236)

on one of the failing nodes

I realized that I was missing some libraries in the files directory of my puppet module.

Most likely this is what was causing the problem with operating system stats!

Thanks for the idea.


(Antonio Bonuccelli) #5

sorry got sidetracked by busy day :slight_smile: but looks like you've found your way! you're welcome


(system) #8