Indexing is going on very slowly


(Siddharth Gupta) #1

Hello,

I am trying to index 2 lakhs documents (On an average size of a single document is :40 kB) on a remote cluster inside our LAN network
Currently I am using Transport Client to connect to the remote cluster through the following code:

Client client = new TransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress("10.1.1.5", 9300))

ip address is of the remote PC where my cluster resides.

Cluster configurations::
master : master node only
data_node1: master node + data node
data_node2: data node only
data_node3: data node only

While I check my log files I see the following errors repeatedly and it has been 2 days since the indexing is going on::

org.elasticsearch.transport.ReceiveTimeoutTransportException: [data_node3][inet[/10.1.1.5:9303]][cluster:monitor/nodes/stats[n]] request_id [715820] timed out after [15000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

org.elasticsearch.transport.ReceiveTimeoutTransportException: [data_node1][inet[/10.1.1.5:9301]][cluster:monitor/nodes/stats[n]] request_id [704409] timed out after [15000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

org.elasticsearch.transport.ReceiveTimeoutTransportException: [data_node2][inet[/10.1.1.5:9301]][cluster:monitor/nodes/stats[n]] request_id [704409] timed out after [15000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)


org.elasticsearch.client.transport.NoNodeAvailableException: No node available
    at org.e.c.t.TransportClientNodesService$RetryListener.onFailure(...)
    at org.e.a.TransportActionNodeProxy$1.handleException(...)
    at org.e.t.TransportService$Adapter$3.run(...)
    ... 3 more

I even tried the below mentioned link which tells about keepTCPalive parameter on OS level:

But it did not help even after trying the steps mentioned in it.

I am facing this error intermittently. Transport
Client works fine sometimes - so it rules out firewall or port related
issues.
I am using elastic search 1.7.1.

  • Firewall is not configured
  • TCP and UDP on port 9300 are open
  • sniff is disabled (I 'm using default transport configurations)

Can anyone please help me to resolve this issue ?

Regards,
Siddharth.


(Mark Walkom) #2

Do your logs show heavy GC? What's in your hot_threads and your slow log?


(Siddharth Gupta) #3

Thanks for the reply!
Yes my logs do show garbage collection but what does it mean by heavy garbage collection ?
And my slow log files are empty ?
Can you please suggest what should I do in the above situation ?

Regards,
Siddharth.


(Mark Walkom) #4

Heavy GC is anything nearing 30 seconds or more.

There's no single solution here, it really depends on how you are doing the indexing, your infrastructure, your data type and lots more. I'd suggest you take a look through some of the other similar topics to see some other approaches.


(Siddharth Gupta) #5

Thanks again !
Now I am able to get the slow_log for indexing and slow_log for searching.
I also noticed that the GC was taking 37s on a single node for young space.
Your replies were helpful to track down this issue.

Regards,
Siddharth.


(Christian Dahlqvist) #6

I would like to know a bit more about how you are indexing your data. Are you indexing your documents in bulk mode? If so, what is the size of each bulk request? How many concurrent threads are you using for indexing? Are you sending requests to all data nodes?

It would also be useful to get some additional information about the cluster and the hardware it sits on. What is the specification for the nodes respect to RAM, heap size, CPU and storage? Have you disabled swap on all nodes?


(Siddharth Gupta) #7

Hello,
Thanks for showing interest.

-Yes we are indexing documents in a bulk mode and the bulk size is 50. (For testing purposes only, otherwise it was 1000)
-I have allocated 16 GB RAM to elastic search out of 32 GB present in my System.

In the end I have to index around 20 lakh documents. Each document has an average size of 60 kb.
I did index 2 lakh documents for testing purposes and faced a lot of problems.

It almost took 10 hours for the complete indexing.
Each document has 5 fields, out of which two are using shingle analyzer and 3 are using comma analyzer.

-THREAD :: I am using only one thread to index my data. Should I increase the number ? If yes then by how much ?
-Yes I am sending requests to all data nodes.
Please find the details related to cluster and hardware mentioned below ::

-Configuration of my system::
-32GB of RAM
-1 TB Hard Disk
-CPU details
-product: Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu@0
size: 2100MHz
capacity: 2100MHz
width: 64 bits

I had initially stated indexing with following nodes:

  1. dedicated master node
  2. master + data node
  3. only data node1
  4. only data node2

But after indexing 1 lakh documents the cluster went unstable due to memory constraints and I had to carry on with one dedicated master node and one data node only.

Please find the jvm details of the data node::

"jvm" : {
"timestamp" : 1441254434494,
"uptime_in_millis" : 86426582,
"mem" : {
"heap_used_in_bytes" : 3942237432,
"heap_used_percent" : 22,
"heap_committed_in_bytes" : 17145004032,
"heap_max_in_bytes" : 17145004032,
"non_heap_used_in_bytes" : 45696584,
"non_heap_committed_in_bytes" : 46006272,
"pools" : {
"young" : {
"used_in_bytes" : 98504376,
"max_in_bytes" : 279183360,
"peak_used_in_bytes" : 279183360,
"peak_max_in_bytes" : 279183360
},
"survivor" : {
"used_in_bytes" : 61248,
"max_in_bytes" : 34865152,
"peak_used_in_bytes" : 34865152,
"peak_max_in_bytes" : 34865152
},
"old" : {
"used_in_bytes" : 3843671808,
"max_in_bytes" : 16830955520,
"peak_used_in_bytes" : 3843671808,
"peak_max_in_bytes" : 16830955520
}
}
}

  • I did try to disable the swap on the nodes by setting the below option in elasticsearch.yml::
    bootstrap.mlockall: true
    But the logs showed the below error, so I am not sure wether swapping is disabled or not::

Unable to lock the JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).
-Can you please suggest something ?

Regards,
Siddharth.


(Christian Dahlqvist) #8

If you are using only a single host for Elasticsearch, there is generally no need to run multiple nodes unless you have more than 64GB of RAM available. You should therefore be able to run a single node that holds data and is master eligible with 16GB of the 32GB available assigned to Heap.

Given that you documents are quite large, a bulk size between 50 and 100 is probably more suitable than 1000, especially since you are experiencing memory related issues.

When you are indexing, are you able to determine what is limiting throughput and indexing speed? Is it perhaps limited by CPU or Disk I/O? Are you using any swap during indexing?


(Siddharth Gupta) #9

Hello,
Thanks for the reply!

Speed of indexing is not that good. When I see the slow indexing logs I see that some documents take more than 500 ms and sometimes even 10 seconds for indexing.

Yes I am using swapping during indexing. However, I tried to turn it off through the parameter mentioned in elasticsearch.yml (bootstrap.mlockall : true) but it did'nt work.
How do I turn it off ?

Thanks & Regards,
Siddharth.


(system) #10