Ruby Ingest Timeout

Hello,

I'm hitting sporadic timeouts when bulk indexing documents into App Search, using the Ruby client.

I have a 3 node cluster setup and I'm giving 4GB of heap to the App Search server.

Here's what my client code looks like:

require 'elastic-app-search'
require 'json'
require 'faker'

client = Elastic::AppSearch::Client.new(
  api_key: 'private-my-key',
  api_endpoint: 'https://my-endpoint/api/as/v1/')

engine_name = 'test'
documents = []

for i in 1..1000000
  doc = {}
  doc['id'] = i
  doc['timestamp'] = Time.now
  doc['author'] = Faker::Book.title
  doc['author'] = Faker::Book.author
  doc['publisher'] = Faker::Book.publisher
  doc['category'] = Faker::Book.genre
  doc['overview'] = Faker::Lorem.paragraphs(number: 1)[0]
  doc['body'] = Faker::Lorem.paragraphs(number: 10).join(' ')

  documents << doc

  if i % 100 == 0
    document_receipts = client.index_documents(engine_name, documents)
    puts "Uploaded #{i} documents"
    documents = []
  end
end

And here's the timeout I'm hitting:

Traceback (most recent call last):
	20: from as-index.rb:12:in `<main>'
	19: from as-index.rb:12:in `each_line'
	18: from as-index.rb:26:in `block in <main>'
	17: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/gems/elastic-app-search-7.7.0/lib/elastic/app-search/client/documents.rb:57:in `index_documents'
	16: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/gems/elastic-app-search-7.7.0/lib/elastic/app-search/request.rb:21:in `post'
	15: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/gems/elastic-app-search-7.7.0/lib/elastic/app-search/request.rb:40:in `request'
	14: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/timeout.rb:108:in `timeout'
	13: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/gems/elastic-app-search-7.7.0/lib/elastic/app-search/request.rb:62:in `block in request'
	12: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/http.rb:1470:in `request'
	11: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/http.rb:920:in `start'
	10: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/http.rb:1472:in `block in request'
	 9: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/http.rb:1479:in `request'
	 8: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/http.rb:1506:in `transport_request'
	 7: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/http.rb:1506:in `catch'
	 6: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/http.rb:1509:in `block in transport_request'
	 5: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/http/response.rb:29:in `read_new'
	 4: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/http/response.rb:40:in `read_status_line'
	 3: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/protocol.rb:201:in `readline'
	 2: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/protocol.rb:191:in `readuntil'
	 1: from /home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/protocol.rb:217:in `rbuf_fill'
/home/ubuntu/.rbenv/versions/2.6.6/lib/ruby/2.6.0/net/protocol.rb:217:in `wait_readable': execution expired (Timeout::Error)

It's very repeatable just not consistent, in terms of when it gets thrown during an ingest run.

What's the best way to track down what's causing this timeout?

I'm trying to ingest millions of documents and the ingest will run for quite a while before it throws one of these exceptions. I do see a 500 in the API Logs section of the Admin UI, with the following response:

Response Body
{
  "errors": [
    "Internal server error. Please check your application server logs for more details and contact Elastic support if the problem persists"
  ]
}

I then go look through /var/log/enterprise-search/app-server.log.* for the time window the error occurred around. It looks like this is the culprit:

...
[2020-06-18T18:24:27.755+00:00][10820][2372][app-server][ERROR]: [24f10217-8383-44cf-90fe-e902d7dfb0ff] /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/gems/gems/manticore-0.6.4-java/lib/manticore/response.rb:122:in `body': Could not read from stream: Read timed out (Manticore::StreamClosedException)
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/gems/gems/manticore-0.6.4-java/lib/faraday/adapter/manticore.rb:80:in `block in call'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/gems/gems/manticore-0.6.4-java/lib/manticore/response.rb:268:in `handleResponse'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/gems/gems/manticore-0.6.4-java/lib/manticore/response.rb:50:in `call'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/gems/gems/manticore-0.6.4-java/lib/faraday/adapter/manticore.rb:97:in `call'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/app/middleware/request_logging_middleware.class:60:in `call'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/gems/gems/faraday-0.15.2/lib/faraday/rack_builder.rb:143:in `build_response'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/gems/gems/faraday-0.15.2/lib/faraday/connection.rb:387:in `run_request'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/es/client.class:52:in `es_http_request'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/es/client.class:37:in `request'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/es/node.class:119:in `request'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/es/index.class:36:in `request'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/es/index.class:278:in `bulk_request'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/es/index.class:303:in `bulk_create'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/actastic/schema.class:47:in `bulk_create'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/document_storage/backends/elasticsearch_backend.class:354:in `bulk_action'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/document_storage/backends/elasticsearch_backend.class:398:in `block in save_constraints!'
        from org/jruby/RubyArray.java:1814:in `each'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/document_storage/backends/elasticsearch_backend.class:375:in `save_constraints!'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/document_storage/backends/elasticsearch_backend.class:430:in `save_documents_with_constraints'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/document_storage/backends/elasticsearch_backend.class:95:in `save_documents'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/lib/swiftype/documents.class:106:in `save_documents'
        from /tmp/jetty-192.168.192.4-3002-enterprise-search.war-_-any-6648671116298572214.dir/webapp/WEB-INF/app/models/engine/documents_concern.class:20:in `save_documents'
...

It's hard to determine what record is causing the error and why.

I'm bulk inserting 100 documents at a time, trying to get to ~5M documents in App Search.

On average, I'll get around 500K documents ingested in a successful run. But after a few hours of the job running, I'll come back to the client returning a client-side timeout (with no other error) and this (above) in the server-side logs. I restart the job and it times out at a different place, after running again for a few hours.

Any suggestions?

The solution here is to set the overall_timeout parameter. It defaults to 15 seconds.

client = Elastic::WorkplaceSearch::Client.new(overall_timeout: 30)

I set mine to 300 and was able to ingest without hitting that Timeout error.

1 Like