I have a syncing utility in Go, that moves data from MongoDB to ES. Here is the gist of it. Create workgroup, launch a goroutine listening on a channel, pull results from Mongo and publish to a channel which will index it to ES using goroutines.
wg := &sync.WaitGroup{}
go func() {
for {
select {
case data := <-resultFeed:
go func() {
defer wg.Done()
pushToES(data)
}()
}
}
}()
... pull from Mongo
for {
wg.Add(1)
resultFeed <- row
}
...
wg.Wait()
Using the official ES client or olivere, the issue is that when using a goroutine to push the results to ES I get connection errors like:
no available connection: no Elasticsearch node available
read tcp 127.0.0.1:52778->127.0.0.1:9200: read: connection reset by peer
write tcp 127.0.0.1:54947->127.0.0.1:9200: write: broken pipe
Those come intermittently as some results will succeed then some will produce an error.
Both ES and Mongo are default local installations. There are no logs produced from ES.
If I dont use a goroutine there are no errors but its obviously much slower and I chose Go specifically for the ability to index concurrently. I haven't tried using bulk requests because it is not only the indexing thats happening so I'd like to keep it per document.
I have no idea about your code, so all of the following is just a crazy assumption, which you are free to dismiss.
When using go-routines or going async in general, are you capping the number of concurrent requests to Elasticsearch or do you keep sending new requests without making sure there is a concurrent limit (i.e. only 5 requests at the same time in flight and triggering the 6th one only, if another one returned)?
Interesting. There are no limitations right now. @spinscale Would there be a configuration setting that I can modify to increase the number of requests allowed in ES?
Doing this on the server side is always an arms race IMO. I'd rather solve this on the client side (and maybe have a simple test first, if this is really the problem, before going deeper). If Elasticsearch is overwhelmed with the number of index operations (but the HTTP connection still works), it will tell you.
Also, make sure to reuse your HTTP connections instead of creating new ones for each request.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.