Strange problem: my ES server almost lost all its data. (All shards failed)


(Patrick Proniewski) #1

Hello,

I'm running a small server with logstash, ES, Kibana. Tonight, I've restarted my ES process. Very bad idea: it restarted with lots of errors, and finally "lost" all its data.
Basically, before restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/

after restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/
elasticsearch/nodes/1/indices/logstash-2014.05.01
elasticsearch/nodes/1/_state/

Then, Kibana was not able to find anything (dashboards lost, etc.).

I've stopped Logstash, stopped Elasticsearch, waited a bit and checked everything is down, then restarted ES. It looked OK, then I've restarted Logstash, and I was able to access my dashboards again. I've just lost 15 minutes of data.
Now I can see that elasticsearch/nodes/0 is the current working directory, and I can browse old data and current data.
elasticsearch/nodes/1 is not used anymore.

I'm running FreeBSD, and used the service command to restart ES. When attempting the second shutdown, the script wouldn't find the pid file, so I've had to kill the Java process.

I don't understand what happened. But I don't feel comfortable putting ES in production. Full log for first and second restart here: http://patpro.net/elastic.log

Any idea?
Regards,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/DEC08780-FC7C-44F7-B7B8-B70215060351%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

On FreeBSD, do you have multicast on IPv6 enabled? You should disable IPv6
on the JVM.

Seems you received a severe network error from the OS.

Jörg

On Thu, May 1, 2014 at 11:46 PM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:

Hello,

I'm running a small server with logstash, ES, Kibana. Tonight, I've
restarted my ES process. Very bad idea: it restarted with lots of errors,
and finally "lost" all its data.
Basically, before restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/

after restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/
elasticsearch/nodes/1/indices/logstash-2014.05.01
elasticsearch/nodes/1/_state/

Then, Kibana was not able to find anything (dashboards lost, etc.).

I've stopped Logstash, stopped Elasticsearch, waited a bit and checked
everything is down, then restarted ES. It looked OK, then I've restarted
Logstash, and I was able to access my dashboards again. I've just lost 15
minutes of data.
Now I can see that elasticsearch/nodes/0 is the current working directory,
and I can browse old data and current data.
elasticsearch/nodes/1 is not used anymore.

I'm running FreeBSD, and used the service command to restart ES. When
attempting the second shutdown, the script wouldn't find the pid file, so
I've had to kill the Java process.

I don't understand what happened. But I don't feel comfortable putting ES
in production. Full log for first and second restart here: <
http://patpro.net/elastic.log>

Any idea?
Regards,
Patrick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/DEC08780-FC7C-44F7-B7B8-B70215060351%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG2Dvb2RTGdyukXOKS1DYGsnDTNQLnzCX%2Ba%2Bx%2B-KiuXjQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Patrick Proniewski) #3

Hi Jörg,

Thank you for your reply.
The service script includes an option that might deal with IPv6, but it's not active:

Force the JVM to use IPv4 stack

elasticshearch_props"-Djava.net.preferIPv4Stack=true"

(http://svnweb.freebsd.org/ports/head/textproc/elasticsearch/files/elasticsearch.in?revision=349955)

In past years, I used to disable IPv6 everywhere (kernel, ports compilation, etc.) but now I don't bother anymore.
Do you mean I should use this option to force IPv4?

Thanks,
Patrick

On 2 mai 2014, at 09:38, joergprante@gmail.com wrote:

On FreeBSD, do you have multicast on IPv6 enabled? You should disable IPv6
on the JVM.

Seems you received a severe network error from the OS.

Jörg

On Thu, May 1, 2014 at 11:46 PM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:

Hello,

I'm running a small server with logstash, ES, Kibana. Tonight, I've
restarted my ES process. Very bad idea: it restarted with lots of errors,
and finally "lost" all its data.
Basically, before restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/

after restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/
elasticsearch/nodes/1/indices/logstash-2014.05.01
elasticsearch/nodes/1/_state/

Then, Kibana was not able to find anything (dashboards lost, etc.).

I've stopped Logstash, stopped Elasticsearch, waited a bit and checked
everything is down, then restarted ES. It looked OK, then I've restarted
Logstash, and I was able to access my dashboards again. I've just lost 15
minutes of data.
Now I can see that elasticsearch/nodes/0 is the current working directory,
and I can browse old data and current data.
elasticsearch/nodes/1 is not used anymore.

I'm running FreeBSD, and used the service command to restart ES. When
attempting the second shutdown, the script wouldn't find the pid file, so
I've had to kill the Java process.

I don't understand what happened. But I don't feel comfortable putting ES
in production. Full log for first and second restart here: <
http://patpro.net/elastic.log>

Any idea?
Regards,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AA0BC7BA-8856-4A23-A172-0601BC0B4FEE%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #4

Yes, you should use this option.

Some FreeBSD kernels seem to have difficulties to run UDP multicast on IPv6
together with IPv4 properly, so I would like to suggest disabling IPv6 use
on the JVM.

Jörg

On Fri, May 2, 2014 at 10:23 AM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:

Hi Jörg,

Thank you for your reply.
The service script includes an option that might deal with IPv6, but it's
not active:

Force the JVM to use IPv4 stack

elasticshearch_props"-Djava.net.preferIPv4Stack=true"

(<
http://svnweb.freebsd.org/ports/head/textproc/elasticsearch/files/elasticsearch.in?revision=349955

)

In past years, I used to disable IPv6 everywhere (kernel, ports
compilation, etc.) but now I don't bother anymore.
Do you mean I should use this option to force IPv4?

Thanks,
Patrick

On 2 mai 2014, at 09:38, joergprante@gmail.com wrote:

On FreeBSD, do you have multicast on IPv6 enabled? You should disable
IPv6
on the JVM.

Seems you received a severe network error from the OS.

Jörg

On Thu, May 1, 2014 at 11:46 PM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:

Hello,

I'm running a small server with logstash, ES, Kibana. Tonight, I've
restarted my ES process. Very bad idea: it restarted with lots of
errors,

and finally "lost" all its data.
Basically, before restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/

after restart, I've had:

elasticsearch/nodes/0/indices/logstash-2014.*
elasticsearch/nodes/0/_state/
elasticsearch/nodes/1/indices/logstash-2014.05.01
elasticsearch/nodes/1/_state/

Then, Kibana was not able to find anything (dashboards lost, etc.).

I've stopped Logstash, stopped Elasticsearch, waited a bit and checked
everything is down, then restarted ES. It looked OK, then I've restarted
Logstash, and I was able to access my dashboards again. I've just lost
15

minutes of data.
Now I can see that elasticsearch/nodes/0 is the current working
directory,

and I can browse old data and current data.
elasticsearch/nodes/1 is not used anymore.

I'm running FreeBSD, and used the service command to restart ES. When
attempting the second shutdown, the script wouldn't find the pid file,
so

I've had to kill the Java process.

I don't understand what happened. But I don't feel comfortable putting
ES

in production. Full log for first and second restart here: <
http://patpro.net/elastic.log>

Any idea?
Regards,
Patrick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/AA0BC7BA-8856-4A23-A172-0601BC0B4FEE%40patpro.net
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGN_zdUi9cCAoAgprBh-Lxtu_g1ejSQWT_nZU0fd_YRTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Patrick Proniewski) #5

Thank you for the tip, Jörg.
I've activated this option and carefully restarted. I've re-read yesterday's log file, and now I think may be the new ES instance started before the former one was completely terminated. This too can cause some network/socket trouble. I might try and add a short sleep into the restart command.

On 2 mai 2014, at 14:07, joergprante@gmail.com wrote:

Yes, you should use this option.

Some FreeBSD kernels seem to have difficulties to run UDP multicast on IPv6
together with IPv4 properly, so I would like to suggest disabling IPv6 use
on the JVM.

Jörg

On Fri, May 2, 2014 at 10:23 AM, Patrick Proniewski <
elasticsearch@patpro.net> wrote:

Hi Jörg,

Thank you for your reply.
The service script includes an option that might deal with IPv6, but it's
not active:

Force the JVM to use IPv4 stack

elasticshearch_props"-Djava.net.preferIPv4Stack=true"

(<
http://svnweb.freebsd.org/ports/head/textproc/elasticsearch/files/elasticsearch.in?revision=349955

)

In past years, I used to disable IPv6 everywhere (kernel, ports
compilation, etc.) but now I don't bother anymore.
Do you mean I should use this option to force IPv4?

Thanks,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CE3C61C8-3EC1-49A0-A6DC-F38432CF123C%40patpro.net.
For more options, visit https://groups.google.com/d/optout.


(system) #6