Elasticsearch and streaming logs

11124 · October 8, 2013, 9:51am

Hi all,

I have a question about storing and indexing streaming logs. Currently I am
testing this scenario, to find best approach how to organize and configure
my setup.

I have a cluster with four nodes (all in virtual machines):

two data nodes (data: true; master: false) with 4 GB RAM
one master node (data: false master: true) with 2 GB RAM
one web frontend (data: false; master: false) with 2GB RAM - it is used
only to access information from external applications like Kibana

Data is pulled from RabbitMQ via river.

Because indexed data is not a plain text I have created probably more
aggressive indexing like this:

"settings": {

    "index": {

        "number_of_shards": 20,

        "number_of_replicas": 1,

        "refresh_interval" : "30s",

        "merge.policy.merge_factor": 30,

        "store.throttle.max_bytes_per_sec": "5mb", 

        "store.throttle.type": "merge",

        "analysis": {

            "filter": {

                "mynGram": {

                    "type": "nGram",

                    "min_gram": 2,

                    "max_gram": 50

                }

            },

            "analyzer": {

                "a1": {

                    "type": "custom",

                    "tokenizer": "whitespace",

                    "filter": [

                        "lowercase",

                        "mynGram"

                    ]

                }

            }

        }

                                            

    }

},

I am a bit concern about the amount of messages that are able to be indexed
without ES to crash.

If I create index without replicas I am able to store and index about 2000
entries in a second. When I try to store more data or I have replicas I got
errors like this:

[54]: index [delme], type [logentry], id [7zIkj_i8RpOBm65ShrZS_A], message
[RemoteTransportException[[Cardiac][inet[/172.31.80.82:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution of
[org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1]];
]

[2013-10-07 08:07:55,212][WARN ][cluster.action.shard ] [Crimson and
the Raven] received shard failed for [delme][1],
node[RC-iJe40TCqMKtV4SMam2A], [R], s[INITIALIZING], reason [Failed to start
shard, message [RecoveryFailedException[[delme][1]: Recovery failed from
[Alibar][8UmjyySBR7mb_TTfzq2hQg][inet[/172.31.80.83:9300]]{master=false}
into
[Cardiac][RC-iJe40TCqMKtV4SMam2A][inet[/172.31.80.82:9300]]{master=false}];
nested:
RemoteTransportException[[Alibar][inet[/172.31.80.83:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[delme][1] Phase[3] Execution failed];
nested:
SendRequestTransportException[[Cardiac][inet[/172.31.80.82:9300]][index/shard/recovery/translogOps]];
nested: OutOfMemoryError[Java heap space]; ]]

I have few questions:

Does anyone have an idea what causes this errors and what is the path to
fix them?
What amount of documents I have to expect to be able to store and with
this setup?
Any article or advice where to search information how to create and
configure ES cluster/indexes that will be able to store and index such
amount streaming logs? On which specific config options and architecture
options to pay attention?

I have to be able to evaluate such setup and to see if we can use ES in
this case and any help will be appreciated.

Thanks in advance.

Best regards,

Nickolay Kolev

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · October 8, 2013, 10:01am

Check your RabbitMQ river, the messages compiled there are too large, so
you get OOM.

You should be able to control bulk size and bulk concurrency there.

In general, you can index many hundred millions of docs with your machines.
Ngrams may slow this down. And do not expect much performance from virtual
machines.

Jörg
Am 08.10.2013 11:51 schrieb "Николай Колев" lem00na@gmail.com:

Hi all,

I have a question about storing and indexing streaming logs. Currently I
am testing this scenario, to find best approach how to organize and
configure my setup.

I have a cluster with four nodes (all in virtual machines):

two data nodes (data: true; master: false) with 4 GB RAM

one master node (data: false master: true) with 2 GB RAM

one web frontend (data: false; master: false) with 2GB RAM - it is
used only to access information from external applications like Kibana

Data is pulled from RabbitMQ via river.

Because indexed data is not a plain text I have created probably more
aggressive indexing like this:
"settings": {

    "index": {

        "number_of_shards": 20,

        "number_of_replicas": 1,

        "refresh_interval" : "30s",

        "merge.policy.merge_factor": 30,

        "store.throttle.max_bytes_per_sec": "5mb",

        "store.throttle.type": "merge",

        "analysis": {

            "filter": {

                "mynGram": {

                    "type": "nGram",

                    "min_gram": 2,

                    "max_gram": 50

                }

            },

            "analyzer": {

                "a1": {

                    "type": "custom",

                    "tokenizer": "whitespace",

                    "filter": [

                        "lowercase",

                        "mynGram"

                    ]

                }

            }

        }



    }

},
I am a bit concern about the amount of messages that are able to be
indexed without ES to crash.

If I create index without replicas I am able to store and index about 2000
entries in a second. When I try to store more data or I have replicas I got
errors like this:

[54]: index [delme], type [logentry], id [7zIkj_i8RpOBm65ShrZS_A], message
[RemoteTransportException[[Cardiac][inet[/172.31.80.82:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution of
[org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1]];
]

[2013-10-07 08:07:55,212][WARN ][cluster.action.shard ] [Crimson and
the Raven] received shard failed for [delme][1],
node[RC-iJe40TCqMKtV4SMam2A], [R], s[INITIALIZING], reason [Failed to start
shard, message [RecoveryFailedException[[delme][1]: Recovery failed from
[Alibar][8UmjyySBR7mb_TTfzq2hQg][inet[/172.31.80.83:9300]]{master=false}
into [Cardiac][RC-iJe40TCqMKtV4SMam2A][inet[/172.31.80.82:9300]]{master=false}];
nested: RemoteTransportException[[Alibar][inet[/172.31.80.83:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[delme][1] Phase[3] Execution failed];
nested: SendRequestTransportException[[Cardiac][inet[/172.31.80.82:9300]][index/shard/recovery/translogOps]];
nested: OutOfMemoryError[Java heap space]; ]]

I have few questions:

Does anyone have an idea what causes this errors and what is the path to
fix them?

What amount of documents I have to expect to be able to store and with
this setup?

Any article or advice where to search information how to create and
configure ES cluster/indexes that will be able to store and index such
amount streaming logs? On which specific config options and architecture
options to pay attention?

I have to be able to evaluate such setup and to see if we can use ES in
this case and any help will be appreciated.

Thanks in advance.

Best regards,

Nickolay Kolev

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

11124 · October 9, 2013, 11:37am

Hallo Joerg,

Thanks for your answer. I will add more machines and will continues testing
to discover what is the performance with different setups to find which one
will work for me.
Unfortunatelly I can test only on virtual machines in OpenStack
environment. I have added more 4 machines with 2 GB RAM and discovered that
when on index level there is on shard per node there are no error and circa
1500 messages per second are indexed. Looks like reallocation of replicas
fails in this case.
This was tested with openjdk 1.6

Then I have switched to Sun JDK 1.7 and things become better. Now the
cluster seems more stable and there are no crashes and rejected messages.

I did not found any option for throttling at least in RabbitMQ river (I
read the docs and the source code). Is this somewhere general available?

Also I did not found any advice on which node to install RabbitMQ river -
currently it is installed on the master node, but I am not sure if this is
correct. Are there any best practices about this?

I am using ES 0.90.3

best regards,
Nickolay Kolev

08 октомври 2013, вторник, 13:01:13 UTC+3, Jörg Prante написа:

Check your RabbitMQ river, the messages compiled there are too large, so
you get OOM.

You should be able to control bulk size and bulk concurrency there.

In general, you can index many hundred millions of docs with your
machines. Ngrams may slow this down. And do not expect much performance
from virtual machines.

Jörg
Am 08.10.2013 11:51 schrieb "Николай Колев" <lem...@gmail.com<javascript:>

:
Hi all,

I have a question about storing and indexing streaming logs. Currently I
am testing this scenario, to find best approach how to organize and
configure my setup.

I have a cluster with four nodes (all in virtual machines):

two data nodes (data: true; master: false) with 4 GB RAM

one master node (data: false master: true) with 2 GB RAM

one web frontend (data: false; master: false) with 2GB RAM - it is
used only to access information from external applications like Kibana

Data is pulled from RabbitMQ via river.

Because indexed data is not a plain text I have created probably more
aggressive indexing like this:
"settings": {

    "index": {

        "number_of_shards": 20,

        "number_of_replicas": 1,

        "refresh_interval" : "30s",

        "merge.policy.merge_factor": 30,

        "store.throttle.max_bytes_per_sec": "5mb", 

        "store.throttle.type": "merge",

        "analysis": {

            "filter": {

                "mynGram": {

                    "type": "nGram",

                    "min_gram": 2,

                    "max_gram": 50

                }

            },

            "analyzer": {

                "a1": {

                    "type": "custom",

                    "tokenizer": "whitespace",

                    "filter": [

                        "lowercase",

                        "mynGram"

                    ]

                }

            }

        }

                                            

    }

},
I am a bit concern about the amount of messages that are able to be
indexed without ES to crash.

If I create index without replicas I am able to store and index about
2000 entries in a second. When I try to store more data or I have replicas
I got errors like this:

[54]: index [delme], type [logentry], id [7zIkj_i8RpOBm65ShrZS_A],
message
[RemoteTransportException[[Cardiac][inet[/172.31.80.82:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution of
[org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1]];
]

[2013-10-07 08:07:55,212][WARN ][cluster.action.shard ] [Crimson and
the Raven] received shard failed for [delme][1],
node[RC-iJe40TCqMKtV4SMam2A], [R], s[INITIALIZING], reason [Failed to start
shard, message [RecoveryFailedException[[delme][1]: Recovery failed from
[Alibar][8UmjyySBR7mb_TTfzq2hQg][inet[/172.31.80.83:9300]]{master=false}
into
[Cardiac][RC-iJe40TCqMKtV4SMam2A][inet[/172.31.80.82:9300]]{master=false}];
nested:
RemoteTransportException[[Alibar][inet[/172.31.80.83:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[delme][1] Phase[3] Execution failed];
nested:
SendRequestTransportException[[Cardiac][inet[/172.31.80.82:9300]][index/shard/recovery/translogOps]];
nested: OutOfMemoryError[Java heap space]; ]]

I have few questions:

Does anyone have an idea what causes this errors and what is the path
to fix them?

What amount of documents I have to expect to be able to store and with
this setup?

Any article or advice where to search information how to create and
configure ES cluster/indexes that will be able to store and index such
amount streaming logs? On which specific config options and architecture
options to pay attention?

I have to be able to evaluate such setup and to see if we can use ES in
this case and any help will be appreciated.

Thanks in advance.

Best regards,

Nickolay Kolev

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · October 9, 2013, 11:58am

I looked at RabbitMQ river, you can try to modify bulk_size parameter
according to your needs. Low bulk size means little memory overhead and
less throughput and a bit more work on the network.

RabbitMQ river is very simple and performs no bulk request concurrency,
that is, a bulk response is expected in a certain time after each request
in strict sequential order, what makes sense in case of a message queue.

bulk_timeout should be raised if the cluster is expected to perform heavy
indexing, for example to 60 seconds.

The switch to Java 7 is a good choice. OpenJDK 6 is outdated and is
affected by bugs, resulting in erratic memory use.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

11124 · October 10, 2013, 1:28pm

Hallo Joerg,

Thanks for your time and support. I was trying different setups -
horizontal and vertical scaling. I observed high CPU load and found the
problem and probably solution.

The problem was the too aggressive analyzer. When I started to learn and
play with Elasticsearch 2 month ago I have almost no knowledge in this area
and was testing with few documents did not show me the problems that I have
this days. Changing the way I analyze and index incoming data boosted the
productivity 3 times and now I have desired performance.

Thanks again and have a nice day.

Best regards,
Nickolay Kolev

09 октомври 2013, сряда, 14:58:33 UTC+3, Jörg Prante написа:

I looked at RabbitMQ river, you can try to modify bulk_size parameter
according to your needs. Low bulk size means little memory overhead and
less throughput and a bit more work on the network.

RabbitMQ river is very simple and performs no bulk request concurrency,
that is, a bulk response is expected in a certain time after each request
in strict sequential order, what makes sense in case of a message queue.

bulk_timeout should be raised if the cluster is expected to perform heavy
indexing, for example to 60 seconds.

The switch to Java 7 is a good choice. OpenJDK 6 is outdated and is
affected by bugs, resulting in erratic memory use.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elastic Search Elasticsearch	4	457	July 6, 2017
ES for logging - what to look after with high indexing rate Elasticsearch	8	1893	September 27, 2017
Bulk indexing via ES HTTP Elasticsearch	3	345	July 6, 2017
Suggestion on Elasticsearch scaling and performance for log management Elasticsearch	9	710	October 15, 2019
Data hours behind in Kibana Kibana	2	595	August 7, 2017

Elasticsearch and streaming logs

Related topics