I am working on a monitoring project were the server sends each requests
metric data back to elasticsearch, to be honest once per minute there
shipped off to the montoring project where they needed to be inserted into
elasticsearch i am already using bulk index but there could be times were
there are thounands of items to be added at once i am not sure if
elasticsearch can handle something like this and what steps are skipped
with bulk index or if there are any better ideas.
Note, there are some limits you should observe: if you use HTTP, check
if you need to raise the upload limit of 100 MB (
http.max_content_length), and, in the ES node, check if your bulk
requests fits into the heap memory. It certainly does, but if you
transmit some GBs at once you might get into some trouble concerning
memory usage and bandwidth (request transfer time).
I recommend to organize concurrent bulk requests, to all available
nodes, and do not foget to evaluate the BulkResponse callbacks to limit
concurrency, so the cluster does not get overloaded.
Jörg
Am 05.04.13 10:53, schrieb Wojons Tech:
I am working on a monitoring project were the server sends each
requests metric data back to elasticsearch, to be honest once per
minute there shipped off to the montoring project where they needed to
be inserted into elasticsearch i am already using bulk index but there
could be times were there are thounands of items to be added at once i
am not sure if elasticsearch can handle something like this and what
steps are skipped with bulk index or if there are any better ideas. --
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Okay that is pretty interesting to think about, I am trying to strucre the
data to be smaller and removing redundant, information.would using syncnous
or async replication be better, when having large bulk inserts.
Note, there are some limits you should observe: if you use HTTP, check if
you need to raise the upload limit of 100 MB ( http.max_content_length),
and, in the ES node, check if your bulk requests fits into the heap memory.
It certainly does, but if you transmit some GBs at once you might get into
some trouble concerning memory usage and bandwidth (request transfer time).
I recommend to organize concurrent bulk requests, to all available nodes,
and do not foget to evaluate the BulkResponse callbacks to limit
concurrency, so the cluster does not get overloaded.
Jörg
Am 05.04.13 10:53, schrieb Wojons Tech:
I am working on a monitoring project were the server sends each requests
metric data back to elasticsearch, to be honest once per minute there
shipped off to the montoring project where they needed to be inserted into
elasticsearch i am already using bulk index but there could be times were
there are thounands of items to be added at once i am not sure if
elasticsearch can handle something like this and what steps are skipped
with bulk index or if there are any better ideas. --
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
The upcoming 0.90 has a lot of performance optimizations for this and
should vastly reduce the impact of doing big bulk additions on a live
cluster. Your best option is probably to bulk index maybe a thousand
documents at a time. You can actually use multiple threads as well to speed
things up. With all the lucene 4 work in 0.90, this should have only a
minimal impact. Everything has a price of course but this sounds like it
should be no problem. I've seen Elasticsearch deal with bulk indexing tens
of thousands of documents per second while still being able to serve read
traffic.
Jilles
On Friday, April 5, 2013 10:53:29 AM UTC+2, Wojons Tech wrote:
I am working on a monitoring project were the server sends each requests
metric data back to elasticsearch, to be honest once per minute there
shipped off to the montoring project where they needed to be inserted into
elasticsearch i am already using bulk index but there could be times were
there are thounands of items to be added at once i am not sure if
elasticsearch can handle something like this and what steps are skipped
with bulk index or if there are any better ideas.
okay perfect i cant wait to use it with what i am working on currently
elasticsearch is doign an amazing job on my small test vm, i will post a
few more questions about other features i am thinking about using.
The upcoming 0.90 has a lot of performance optimizations for this and
should vastly reduce the impact of doing big bulk additions on a live
cluster. Your best option is probably to bulk index maybe a thousand
documents at a time. You can actually use multiple threads as well to speed
things up. With all the lucene 4 work in 0.90, this should have only a
minimal impact. Everything has a price of course but this sounds like it
should be no problem. I've seen Elasticsearch deal with bulk indexing tens
of thousands of documents per second while still being able to serve read
traffic.
Jilles
On Friday, April 5, 2013 10:53:29 AM UTC+2, Wojons Tech wrote:
I am working on a monitoring project were the server sends each requests
metric data back to elasticsearch, to be honest once per minute there
shipped off to the montoring project where they needed to be inserted into
elasticsearch i am already using bulk index but there could be times were
there are thounands of items to be added at once i am not sure if
elasticsearch can handle something like this and what steps are skipped
with bulk index or if there are any better ideas.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.