Not able to index large csv files using java bulk api

I am using es 6.6.2 to index a csv file(150MB), that contains about a million documents, using my java application. I read the file using bufferreader and then using json builder I add the documents to bulkrequest. When csv file is around 10 MB I am able to index it successfully. But when file size increases I execute the bulk requests in batches. I have tried to index bulk requests of batch size 100,1000,10000,100000. The first batch gets indexed very fast(2-4 seconds) and then time taken for subsequent batches increases upto minutes and then after the 4th or 5th batch I get the following exception after which Elastic Search goes down and I have to restart the es service :

    org.elasticsearch.transport.ReceiveTimeoutTransportException: [][127.0.0.1:9300][cluster:monitor/nodes/liveness] request_id [34] timed out after [5003ms]
    	at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:1010) ~[elasticsearch-6.7.1.jar:6.7.1]
    	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-6.7.1.jar:6.7.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]


NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{YLojUlzuTtmZ7pAmHvWeMw}{localhost}{127.0.0.1:9300}]]
	at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352)
	at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248)
	at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60)
	at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391)
	at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1262)
	at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:46)

The transport client is established at the beginning of the application and is static. Memory for jvm is 1gb.
Please tell me how shoud I index such large files into es.

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

Are you sure you are creating a new bulk request after each bulk has been executed?
Are you using the bulk processor class?

What does your code look like?

I have updated my post. You were right, I was not creating a new bulk request after each bulk was executed. The issue is resolved now. Thanks.

Great. I'd encourage you using the BulkProcessor class if you are using Java.

An example here:

Here is how you index documents:

HTH

Hi david,

I have a file containing keywords which I need to search for in the index. It is a csv file containing 3 headers - keyword,brandName,bucketName

What I need to do is search for the keyword in all fields of the index if brandName and bucketName values are '*', and if they are something else then I need to search the keyword in the fields whose value matches the bucketName and brandName. My index contains many fields includin brandName and bucketName. Please advise as to which search query I should use. I am not able to use multimatch query since number of fields for a document is not fixed.

Could you open a new question? This is unrelated to the initial one I think.

Done.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.