I am using es 6.6.2 to index a csv file(150MB), that contains about a million documents, using my java application. I read the file using bufferreader and then using json builder I add the documents to bulkrequest. When csv file is around 10 MB I am able to index it successfully. But when file size increases I execute the bulk requests in batches. I have tried to index bulk requests of batch size 100,1000,10000,100000. The first batch gets indexed very fast(2-4 seconds) and then time taken for subsequent batches increases upto minutes and then after the 4th or 5th batch I get the following exception after which Elastic Search goes down and I have to restart the es service :
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][127.0.0.1:9300][cluster:monitor/nodes/liveness] request_id [34] timed out after [5003ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1010) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-6.7.1.jar:6.7.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{YLojUlzuTtmZ7pAmHvWeMw}{localhost}{127.0.0.1:9300}]]
at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352)
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248)
at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60)
at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391)
at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1262)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:46)
The transport client is established at the beginning of the application and is static. Memory for jvm is 1gb.
Please tell me how shoud I index such large files into es.
Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.
Or use markdown style like:
```
CODE
```
This is the icon to use if you are not using markdown format:
There's a live preview panel for exactly this reasons.
Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.
Are you sure you are creating a new bulk request after each bulk has been executed?
Are you using the bulk processor class?
I have a file containing keywords which I need to search for in the index. It is a csv file containing 3 headers - keyword,brandName,bucketName
What I need to do is search for the keyword in all fields of the index if brandName and bucketName values are '*', and if they are something else then I need to search the keyword in the fields whose value matches the bucketName and brandName. My index contains many fields includin brandName and bucketName. Please advise as to which search query I should use. I am not able to use multimatch query since number of fields for a document is not fixed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.