xiehaiwei
(xiehaiwei)
September 2, 2014, 4:46am
1
Hi all,
In our ES system, one line of a Mysql table will be indexing as a
document, but indexing speed is slow.
My Questions:
how fast of using BulkAPI indexing compared with single indexing?
If ’Word Segmentation‘ is the problem, how to deal it?
Can I use multi nodes of ES cluster to parallelly indexing in one Index?
Thanks.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f7eae49-1bee-4bdd-9a8c-c9d1178fccdc%40googlegroups.com .
For more options, visit https://groups.google.com/d/optout .
Hello ,
One tip from my experience -
Disable refresh before bulk indexing and enable it once its done. ES
waits for 1 second and then make all documents which are indexed during
that time , searchable. -
Elasticsearch Platform — Find real-time answers at scale | Elastic
Reduce replica to 0 while bulk indexing.
Increase number of machines and add the shard number . The indexing
is happening in parallel. So more machines with a shard in it will help.
"If ’Word Segmentation‘ is the problem" - Please elaborate.
Thanks
Vineeth
On Tue, Sep 2, 2014 at 10:16 AM, xiehaiwei@gmail.com wrote:
Hi all,
In our ES system, one line of a Mysql table will be indexing as a
document, but indexing speed is slow.
My Questions:
how fast of using BulkAPI indexing compared with single indexing?
If ’Word Segmentation‘ is the problem, how to deal it?
Can I use multi nodes of ES cluster to parallelly indexing in one
Index?
Thanks.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4f7eae49-1bee-4bdd-9a8c-c9d1178fccdc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4f7eae49-1bee-4bdd-9a8c-c9d1178fccdc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout .
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kP_adjHNyMoC5-VTzt6%2ByX8bEhfWmH3KFaCtDYiSQ8Mg%40mail.gmail.com .
For more options, visit https://groups.google.com/d/optout .
xiehaiwei
(xiehaiwei)
September 2, 2014, 6:43am
3
Hi,
"If ’Word Segmentation‘ is the problem" - means, word
segmentation analyzer speed is not good,
about 1MB/s when runs independently. In our case, many fields of a
document need to be segment.
"more machines with a shard" - Will a shard be running in multi
nodes? Do you mean with a cluster?
Thanks.
Haiwei
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d601d98-18c9-4e63-bffc-6948a072e30a%40googlegroups.com .
For more options, visit https://groups.google.com/d/optout .
Hello Haiwei ,
The more hardware you can get , it should be better unless the data is too
small.
So if there are 10 machines , set the shards as 10 , so that the index can
uniformly use all the resources.
Thanks
Vineeth
On Tue, Sep 2, 2014 at 12:13 PM, xiehaiwei@gmail.com wrote:
Hi,
"If ’Word Segmentation‘ is the problem" - means, word
segmentation analyzer speed is not good,
about 1MB/s when runs independently. In our case, many fields of a
document need to be segment.
"more machines with a shard" - Will a shard be running in multi
nodes? Do you mean with a cluster?
Thanks.
Haiwei
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2d601d98-18c9-4e63-bffc-6948a072e30a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2d601d98-18c9-4e63-bffc-6948a072e30a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout .
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m55uabMdvycqd-VO8jwxU1pcMnjh1QqYO1W_cc4ss9_w%40mail.gmail.com .
For more options, visit https://groups.google.com/d/optout .
xiehaiwei
(xiehaiwei)
September 2, 2014, 9:05am
5
Hi, mohan
My lastest testing, indixing data about 14000 documents.
Tuning BulkAPI params, 6m. Before tuning, time is 14m.
[INFO] Total time: 6:06.173s
[INFO] Finished at: Tue Sep 02 15:40:36 CST 2014
[INFO] Final Memory: 27M/312M
ref:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html#bulk
Close Analyzer of Strings, 18s.
[INFO] Total time: 18.499s
[INFO] Finished at: Tue Sep 02 15:52:47 CST 2014
[INFO] Final Memory: 29M/312M
So, Is Analyzer of Strings indexing the bigest problem of perfomance?
Thanks.
Haiwei.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/332f63cd-84f0-494a-9868-5ac9a702f2b2%40googlegroups.com .
For more options, visit https://groups.google.com/d/optout .