xiehaiwei  
                (xiehaiwei)
               
                 
              
                  
                    September 2, 2014,  4:46am
                   
                   
              1 
               
             
            
              Hi all, 
In our ES system,  one line of a Mysql table  will be indexing  as a 
document, but indexing speed is slow.
My Questions:
how fast of using BulkAPI indexing compared with single indexing? 
If ’Word Segmentation‘ is the problem, how to deal it? 
Can I use multi nodes of ES cluster to parallelly indexing in one Index? 
 
Thanks.
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f7eae49-1bee-4bdd-9a8c-c9d1178fccdc%40googlegroups.com . 
For more options, visit https://groups.google.com/d/optout .
             
            
               
               
               
            
            
           
          
            
            
              Hello ,
One tip from my experience -
Disable refresh before bulk indexing and enable it once its done. ES 
waits for 1 second and then make all documents which are indexed during 
that time , searchable. - 
Elasticsearch Platform — Find real-time answers at scale | Elastic  
Reduce replica to 0 while bulk indexing. 
Increase number of  machines and add the shard number . The indexing 
is happening in parallel. So more machines with a shard in it will help. 
 
"If ’Word Segmentation‘ is the problem" - Please elaborate.
Thanks 
Vineeth
On Tue, Sep 2, 2014 at 10:16 AM, xiehaiwei@gmail.com  wrote:
Hi all, 
In our ES system,  one line of a Mysql table  will be indexing  as a 
document, but indexing speed is slow.
My Questions:
how fast of using BulkAPI indexing compared with single indexing? 
If ’Word Segmentation‘ is the problem, how to deal it? 
Can I use multi nodes of ES cluster to parallelly indexing in one 
Index? 
 
Thanks.
-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an 
email to elasticsearch+unsubscribe@googlegroups.com . 
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4f7eae49-1bee-4bdd-9a8c-c9d1178fccdc%40googlegroups.com  
https://groups.google.com/d/msgid/elasticsearch/4f7eae49-1bee-4bdd-9a8c-c9d1178fccdc%40googlegroups.com?utm_medium=email&utm_source=footer  
. 
For more options, visit https://groups.google.com/d/optout .
 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kP_adjHNyMoC5-VTzt6%2ByX8bEhfWmH3KFaCtDYiSQ8Mg%40mail.gmail.com . 
For more options, visit https://groups.google.com/d/optout .
             
            
               
               
               
            
            
           
          
            
              
                xiehaiwei  
                (xiehaiwei)
               
              
                  
                    September 2, 2014,  6:43am
                   
                   
              3 
               
             
            
              Hi,
      "If ’Word Segmentation‘ is the problem" - means, word 
 
segmentation analyzer speed is not good, 
about 1MB/s when runs independently.  In our case, many fields of a 
document need  to be segment.
    "more machines with a shard" - Will a shard be running in multi 
 
nodes?  Do you mean  with a cluster?
Thanks. 
Haiwei
 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d601d98-18c9-4e63-bffc-6948a072e30a%40googlegroups.com . 
For more options, visit https://groups.google.com/d/optout .
             
            
               
               
               
            
            
           
          
            
            
              Hello Haiwei ,
The more hardware you can get , it should be better unless the data is too 
small. 
So if there are 10 machines , set the shards as 10 , so that the index can 
uniformly use all the resources.
Thanks 
Vineeth
On Tue, Sep 2, 2014 at 12:13 PM, xiehaiwei@gmail.com  wrote:
Hi,
      "If ’Word Segmentation‘ is the problem" - means, word
 
segmentation analyzer speed is not good, 
about 1MB/s when runs independently.  In our case, many fields of a 
document need  to be segment.
    "more machines with a shard" - Will a shard be running in multi
 
nodes?  Do you mean  with a cluster?
Thanks. 
Haiwei
-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an 
email to elasticsearch+unsubscribe@googlegroups.com . 
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d601d98-18c9-4e63-bffc-6948a072e30a%40googlegroups.com  
https://groups.google.com/d/msgid/elasticsearch/2d601d98-18c9-4e63-bffc-6948a072e30a%40googlegroups.com?utm_medium=email&utm_source=footer  
.
 
For more options, visit https://groups.google.com/d/optout .
 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m55uabMdvycqd-VO8jwxU1pcMnjh1QqYO1W_cc4ss9_w%40mail.gmail.com . 
For more options, visit https://groups.google.com/d/optout .
             
            
               
               
               
            
            
           
          
            
              
                xiehaiwei  
                (xiehaiwei)
               
              
                  
                    September 2, 2014,  9:05am
                   
                   
              5 
               
             
            
              Hi, mohan
My lastest testing, indixing data about 14000 documents.
 
Tuning BulkAPI params,  6m. Before tuning, time is 14m. 
[INFO] Total time: 6:06.173s 
[INFO] Finished at: Tue Sep 02 15:40:36 CST 2014 
[INFO] Final Memory: 27M/312M 
ref: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html#bulk 
 
Close Analyzer of Strings, 18s. 
[INFO] Total time: 18.499s 
[INFO] Finished at: Tue Sep 02 15:52:47 CST 2014 
[INFO] Final Memory: 29M/312M
 
 
So, Is Analyzer of Strings indexing the bigest problem of perfomance?
Thanks. 
Haiwei.
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/332f63cd-84f0-494a-9868-5ac9a702f2b2%40googlegroups.com . 
For more options, visit https://groups.google.com/d/optout .