Data Loss

Here is the Elasticsearch config of my development ES 1.0 cluster (3
nodes). I'm still experimenting with some settings.

Server: HP DL 165 G7 (64GB RAM), 1TB RAID0 SAS, OS: RHEL 6.3, JVM: Java 8
FCS

Jörg

bootstrap:
mlockall: true

store:
index:
type: mmapfs

network:
host: local

transport:
ping_timeout: 30s
tcp:
port: 19300

http:
port: 19200

cluster:
name: "zbn-1.0"

index:
codec:
bloom:
load: false
number_of_shards: 3
number_of_replicas: 1
merge:
policy:
max_merged_segment: 1gb
segments_per_tier: 24
max_merge_at_once: 4
similarity:
default:
type: BM25

indices:
recovery:
concurrent_streams: 8

gateway:
type: "local"
recover_after_nodes: 3
recover_after_time: 15m
expected_nodes: 3

threadpool:
bulk:
type: fixed
size: 64
queue_size: 128

discovery:
zen:
ping_timeout: 5s
minimum_master_nodes: 2

On Thu, Feb 13, 2014 at 11:20 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

I'm keeping my shards small (1-5 GB), I do not disable shard allocation,
the recovery is fast enough (a few minutes)

Gateway is local, the default.

I keep backups on the file system, with plain cp -a or rsync, but also on
the data source side - I can reindex everything in a few hours, I have only
~100 mio. docs to index.

Because I don't operate a shared file system, I can not use ES 1.0
snapshot/restore yet.

Jörg

On Wed, Feb 12, 2014 at 11:27 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

Do you have special settings for "disable shard allocation" and "gateway"
modules? Would you be able to share those settings? There are numerous
number of settings and I am trying to understand which ones make most sense?

How are you taking backups of data today? I believe new incremental
backup feature is slated in 1.0.

Are there other best practices you can share would be helpful.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2v1Z25KnMx2dkQPWM-dhVmddM9nTYKpb_AG%2BO8cpG6Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you disable shard allocation, you disable the internal automatic load
balancing, ES will not be able to help distributing search/index requests
by itself. Personally I have much difficulties in disabling shard
allocation (except for node decomissioinng) because I really like the magic
behind the auto shard movements.

The shard moves are designed to run in the background. Unless your shards
are gigantic, or your shard numbers are huge, this process will not
influence the normal cluster operation. It will just take some time to
complete, which might be bothering. In that case, maybe the total number of
nodes does not suffice to handle shard movements and should be increased.
This is a matter of sizing the whole cluster correctly.

Replica level can be increased for two reasons. First, to distribute search
load. Second, to allow more nodes to fail simultaneously.

Jörg

On Thu, Feb 13, 2014 at 11:33 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

This is interesting, I was thinking by having more replicas we can work
around issues when few nodes go down. Is this more challenging than that? I
am trying to understand where should I put more focus on to get HA search
solution. We have large data sets so I also worry about too much index
movement causing performance issues. Is there a downside of completely
disabling shard allocation.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGDbtF7GY14z%3DTPKaumTNJssb4Ce1hq_ENiuOibGRVhgg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for sharing all the details. Have you come accross any data loss
situation during shard allocation? It looks like most of the data loss
issues are somewhat related to shard movement.

On Thu, Feb 13, 2014 at 2:44 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

If you disable shard allocation, you disable the internal automatic load
balancing, ES will not be able to help distributing search/index requests
by itself. Personally I have much difficulties in disabling shard
allocation (except for node decomissioinng) because I really like the magic
behind the auto shard movements.

The shard moves are designed to run in the background. Unless your shards
are gigantic, or your shard numbers are huge, this process will not
influence the normal cluster operation. It will just take some time to
complete, which might be bothering. In that case, maybe the total number of
nodes does not suffice to handle shard movements and should be increased.
This is a matter of sizing the whole cluster correctly.

Replica level can be increased for two reasons. First, to distribute
search load. Second, to allow more nodes to fail simultaneously.

Jörg

On Thu, Feb 13, 2014 at 11:33 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

This is interesting, I was thinking by having more replicas we can work
around issues when few nodes go down. Is this more challenging than that? I
am trying to understand where should I put more focus on to get HA search
solution. We have large data sets so I also worry about too much index
movement causing performance issues. Is there a downside of completely
disabling shard allocation.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGDbtF7GY14z%3DTPKaumTNJssb4Ce1hq_ENiuOibGRVhgg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWq6Gh%2BJtXzDj0Pfexdk%3DO_KY5ojvZqWTMUK2%3Dm68GFzuQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

No data loss here (except one case some months ago, with an older version
of Lucene, and a wrong file descriptor limit setting, with test data)

No problems with shard movement here.

It helped me

  • to have a robust network environment with low latency and high throughput
  • to keep up with the latest ES version
  • to use the latest JVM version
  • to reindex the source data in case the Lucene version changed
  • to set the heap around 4-8 GB per node
  • and G1GC, for less GC pauses

A node going low on free resources (heap, OS file descriptors, disk space)
is the most popular cause of trouble. If the JVM is not able to allocate
write buffers, all kind of bad things happen (lost and corrupted files etc)
that may harm shard recovery.

Jörg

On Thu, Feb 13, 2014 at 11:54 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

Thanks for sharing all the details. Have you come accross any data loss
situation during shard allocation? It looks like most of the data loss
issues are somewhat related to shard movement.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGinF3yX_0B2Cn1fxc1B9xCccVXTVcS-GX3uamfjvqq-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks! Looks like you've found a perfect fit and I assuming you have
learnt lessons over time :slight_smile: thanks for sharing that.

On Thu, Feb 13, 2014 at 3:21 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

No data loss here (except one case some months ago, with an older version
of Lucene, and a wrong file descriptor limit setting, with test data)

No problems with shard movement here.

It helped me

  • to have a robust network environment with low latency and high throughput
  • to keep up with the latest ES version
  • to use the latest JVM version
  • to reindex the source data in case the Lucene version changed
  • to set the heap around 4-8 GB per node
  • and G1GC, for less GC pauses

A node going low on free resources (heap, OS file descriptors, disk space)
is the most popular cause of trouble. If the JVM is not able to allocate
write buffers, all kind of bad things happen (lost and corrupted files etc)
that may harm shard recovery.

Jörg

On Thu, Feb 13, 2014 at 11:54 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

Thanks for sharing all the details. Have you come accross any data loss
situation during shard allocation? It looks like most of the data loss
issues are somewhat related to shard movement.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGinF3yX_0B2Cn1fxc1B9xCccVXTVcS-GX3uamfjvqq-Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWr1e7EjEm3iXH%3DdUic8mpGXtk_8-972jRCqQKpehYSpfg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jörg,

So if your shards are at most 5GB and you have 3x2 shards, then your data,
per index, is "only" 30GB?

Don't know why, maybe because it is what I used in Lucene, but I
always had segments_per_tier
and max_merge_at_once to be the same value. I had them higher than the
default of 10, but I slowly reduced them back to the default. How do you
tune BM25? Hard to debug since you cannot change similarities on the fly.

Cheers,

Ivan

On Thu, Feb 13, 2014 at 2:36 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Here is the Elasticsearch config of my development ES 1.0 cluster (3
nodes). I'm still experimenting with some settings.

Server: HP DL 165 G7 (64GB RAM), 1TB RAID0 SAS, OS: RHEL 6.3, JVM: Java 8
FCS

Jörg

bootstrap:
mlockall: true

store:
index:
type: mmapfs

network:
host: local

transport:
ping_timeout: 30s
tcp:
port: 19300

http:
port: 19200

cluster:
name: "zbn-1.0"

index:
codec:
bloom:
load: false
number_of_shards: 3
number_of_replicas: 1
merge:
policy:
max_merged_segment: 1gb
segments_per_tier: 24
max_merge_at_once: 4
similarity:
default:
type: BM25

indices:
recovery:
concurrent_streams: 8

gateway:
type: "local"
recover_after_nodes: 3
recover_after_time: 15m
expected_nodes: 3

threadpool:
bulk:
type: fixed
size: 64
queue_size: 128

discovery:
zen:
ping_timeout: 5s
minimum_master_nodes: 2

On Thu, Feb 13, 2014 at 11:20 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

I'm keeping my shards small (1-5 GB), I do not disable shard allocation,
the recovery is fast enough (a few minutes)

Gateway is local, the default.

I keep backups on the file system, with plain cp -a or rsync, but also on
the data source side - I can reindex everything in a few hours, I have only
~100 mio. docs to index.

Because I don't operate a shared file system, I can not use ES 1.0
snapshot/restore yet.

Jörg

On Wed, Feb 12, 2014 at 11:27 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:

Do you have special settings for "disable shard allocation" and
"gateway" modules? Would you be able to share those settings? There are
numerous number of settings and I am trying to understand which ones make
most sense?

How are you taking backups of data today? I believe new incremental
backup feature is slated in 1.0.

Are there other best practices you can share would be helpful.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2v1Z25KnMx2dkQPWM-dhVmddM9nTYKpb_AG%2BO8cpG6Q%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD%3D_UzsirdeBcpJ-Qki50Q7wdOpYgRjNO3rDoHjPVEtCw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, index size vary between 10 and 50GB.

segments_per_tier is high because I allow many segments, I don't care for
optimal search performance (unless I run an optimize to a single segment
after bulk).

max_merge_at_once is low because I do not want many merge thread activity
during bulk.

I'm experimenting with Okapi BM25. It is better for many short fields in a
doc with short query terms. From what I can see it works better with
library catalog metadata than default Lucene ranking formula.

Jörg

On Fri, Feb 14, 2014 at 4:36 PM, Ivan Brusic ivan@brusic.com wrote:

Jörg,

So if your shards are at most 5GB and you have 3x2 shards, then your data,
per index, is "only" 30GB?

Don't know why, maybe because it is what I used in Lucene, but I always
had segments_per_tier and max_merge_at_once to be the same value. I had
them higher than the default of 10, but I slowly reduced them back to the
default. How do you tune BM25? Hard to debug since you cannot change
similarities on the fly.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEPUYEZYruu%3DHOUJDG4Jo6%2BBb5_1o%3DoMowjXrCwdQ-rfw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.