TimeOut Error during indexing

amir_veyseh · November 23, 2020, 7:24am

I have installed EleasticSearch using the instructions from here. I run it using systemd. I have also made the following changes in the configuration:

Disabling swap
Change heap size to 20 GB
Set refresh interval to -1

Here's the config file:

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: search
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: myserver
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /extract/elasticsearch/data
#path.data: /var/data/elasticsearch
#
# Path to log files:
#
path.logs: /extract/elasticsearch/log
#path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 0
#
# Set a custom port for HTTP:
#
#http.port: 9201
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["0.0.0.0","myserver"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["myserver"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

I have already indexed 2.0 billion documents. However, during indexing or sometimes searching, I receive timeout error. I index files using multiprocssing with python. I use the python package of ElastichSearch and index files using:

streaming_bulk(client=client, index="corpora", actions=generator(), request_timeout=60)

(inserting document one by one has the same issue, searching has sometimes the same issue too). The server that I am using has 180 GB of memory and 88 CPUs. ElasticSearch is not the only process on this server but it is the major one.

Here's the output of getting index status using this command curl -X GET "myserver:9200/_cat/indices":

yellow open   corpora-split     tMdjFHfMR1OYfM_WptI0Kw   5   1 2005293875            0    672.2gb        672.2gb

Is there any suggestion on how can I avoid timeout errors?

warkolm · November 23, 2020, 7:26am

I would suggest using more than one shard for this, there is a 2^32-1 doc limit on a single shard anyway.

amir_veyseh · November 23, 2020, 7:29am

Yes, I just updated the question. Actually, I'm using an index with 5 shards and it has already indexed 2.0 B documents.

warkolm · November 23, 2020, 7:29am

Ok. What do the Elasticsearch logs show?

Christian_Dahlqvist · November 23, 2020, 8:04am

672GB across 5 shards mean each shard is quite large. As you seem to be using a single node this may be fine as no relocations are likely. Although the ideal shard size will depend on use case, data and queries the recommended size if often around 50GB or so.

amir_veyseh · November 28, 2020, 4:51am

After increasing the request_timeout in the following command from 60 to 600 the issue was resolved and I never got a time-out error.

streaming_bulk(client=client, index="corpora", actions=generator(), request_timeout=60)

system · December 26, 2020, 4:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to connect to Elasticsearch. Error: Request Timeout after 30000ms Kibana	5	1002	October 13, 2020
Elasticsearch timeout issue Elasticsearch	4	425	November 18, 2020
Unable to start elastic search on Ubuntu AWS EC2 Instance Elasticsearch	5	461	April 26, 2023
Not able to make elasticsearch work on our website Elasticsearch	7	1374	July 5, 2017
Elasticsearch exception: Unexpected error while indexing monitoring document Elasticsearch	3	3341	August 10, 2018

TimeOut Error during indexing

Related topics