How to Speed Up Indexing


(xiehaiwei) #1

Hi all,
In our ES system, one line of a Mysql table will be indexing as a
document, but indexing speed is slow.

My Questions:

  1. how fast of using BulkAPI indexing compared with single indexing?
  2. If ’Word Segmentation‘ is the problem, how to deal it?
  3. Can I use multi nodes of ES cluster to parallelly indexing in one Index?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f7eae49-1bee-4bdd-9a8c-c9d1178fccdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #2

Hello ,

One tip from my experience -

  1. Disable refresh before bulk indexing and enable it once its done. ES
    waits for 1 second and then make all documents which are indexed during
    that time , searchable. -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html#bulk
  2. Reduce replica to 0 while bulk indexing.
  3. Increase number of machines and add the shard number . The indexing
    is happening in parallel. So more machines with a shard in it will help.

"If ’Word Segmentation‘ is the problem" - Please elaborate.

Thanks
Vineeth

On Tue, Sep 2, 2014 at 10:16 AM, xiehaiwei@gmail.com wrote:

Hi all,
In our ES system, one line of a Mysql table will be indexing as a
document, but indexing speed is slow.

My Questions:

  1. how fast of using BulkAPI indexing compared with single indexing?
  2. If ’Word Segmentation‘ is the problem, how to deal it?
  3. Can I use multi nodes of ES cluster to parallelly indexing in one
    Index?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4f7eae49-1bee-4bdd-9a8c-c9d1178fccdc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4f7eae49-1bee-4bdd-9a8c-c9d1178fccdc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kP_adjHNyMoC5-VTzt6%2ByX8bEhfWmH3KFaCtDYiSQ8Mg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(xiehaiwei) #3

Hi,

      "If ’Word Segmentation‘ is the problem" - means, word 

segmentation analyzer speed is not good,
about 1MB/s when runs independently. In our case, many fields of a
document need to be segment.

    "more machines with a shard" - Will a shard be running in multi 

nodes? Do you mean with a cluster?

Thanks.
Haiwei

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d601d98-18c9-4e63-bffc-6948a072e30a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #4

Hello Haiwei ,

The more hardware you can get , it should be better unless the data is too
small.
So if there are 10 machines , set the shards as 10 , so that the index can
uniformly use all the resources.

Thanks
Vineeth

On Tue, Sep 2, 2014 at 12:13 PM, xiehaiwei@gmail.com wrote:

Hi,

      "If ’Word Segmentation‘ is the problem" - means, word

segmentation analyzer speed is not good,
about 1MB/s when runs independently. In our case, many fields of a
document need to be segment.

    "more machines with a shard" - Will a shard be running in multi

nodes? Do you mean with a cluster?

Thanks.
Haiwei

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2d601d98-18c9-4e63-bffc-6948a072e30a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2d601d98-18c9-4e63-bffc-6948a072e30a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m55uabMdvycqd-VO8jwxU1pcMnjh1QqYO1W_cc4ss9_w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(xiehaiwei) #5

Hi, mohan

My lastest testing, indixing data about 14000 documents.
  1. Tuning BulkAPI params, 6m. Before tuning, time is 14m.
    [INFO] Total time: 6:06.173s
    [INFO] Finished at: Tue Sep 02 15:40:36 CST 2014
    [INFO] Final Memory: 27M/312M
    ref:
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html#bulk

  2. Close Analyzer of Strings, 18s.
    [INFO] Total time: 18.499s
    [INFO] Finished at: Tue Sep 02 15:52:47 CST 2014
    [INFO] Final Memory: 29M/312M

So, Is Analyzer of Strings indexing the bigest problem of perfomance?

Thanks.
Haiwei.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/332f63cd-84f0-494a-9868-5ac9a702f2b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6