We are encountering issues when trying to index large numbers of documents in parallel. More than 100 indexing requests per second. Error: socket hang up
. Are we running into limits with our ES instance? We have continually struggled to index large amounts of data with Enterprise search.
This isn't a lot to go on, so if my suggestions don't help, please share some more details (size of your deployment, details of your indexing requests, error logs for enterprise search and elasticsearch, etc).
The first thing I'd make sure of is that you're submitting indexing requests at the maximum batch size (100 documents per indexing request). The next thing I'd do is to try to limit the number of concurrent requests that you're sending to a single Enterprise Search instance. We use Jetty under the hood, which has a hard limit of 200 threads in its threadpool. We assign fully half of those (100) to handling requests. So if you have more than 100 concurrent requests to an Enterprise Search server, you're going to have dropped requests.
If you absolutely need a higher volume of concurrent requests, I suggest you spin up additional Enterprise Search servers and put them behind a load balancer.
A final option could be to leverage an Elasticsearch Index Engine to bypass App Search's indexing API and index directly to Elasticsearch. If you choose this option, be careful to first copy the index settings and mappings from an App Search engine to use on your new index before you start indexing data.
@Sean_Story Thanks for the response. I believe we are overloading our Enterprise Search instance with too many parallel requests and that is the cause of our error. We don't need to horizontally scale as we have no issues handling search requests. Just when indexing. Going directly to Elasticsearch has a lot of appeal I think. We are doing all of this through our API right now and I wonder if we should be using something like Logstash to sync our data from Postgres to Elasticsearch and then using Enterprise Search to build engines from those indexes. Would Logstash allow us to index large amounts of data from Postgres quickly?
Hi @mbrimmer83 ,
I'm not a Logstash expert, but it does look like the JDBC Input Plugin for Logstash might help you. If you have already written the code to form up documents from Postgres and index to App Search though, it might be faster for you to just swap out the Enterprise Search client for an Elasticsearch client. Lots of ways to skin a cat.
You also may be interested to check out the newer Elastic Connector project (docs) (github), which has a Postgres connector coming out in Beta right now. Note the docs for the postgres connector are only on the main
branch right now, as they (at time of writing) have not been released yet.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.