Elasticsearch multi threading problems and How to improve elasticsearch performance

Hello eveyone, I have 15.1 Million documents and I have such a problem with elasticsearch multi threading, I'm trying to search 46 aggregations(count, cardinality, date_histogram) together and it takes about 20-45 sec. That's far too long for us. I think My expectation is about 2-5 sec. Any help is much appreciated.

Cluster: 4Core, 16GB RAM Server.

I use Python low-level client library called elasticsearch-py

Query and Code Execution Time:
full es query

part of the code, view full code

start = time.time()
client.search(index=['nginx*'], doc_type=None, body=main_query)
print('Standard Search Query Execution Time: ', time.time() - start)

Output: Standard Search Query Execution Time:  19.81995415687561s

start = time.time()
client.msearch(body=msearch_query)
print('Standard Multi Search Query Execution Time: ', time.time() - start)

Output: Standard Multi Search Query Execution Time:  19.14345407485962s

start = time.time()
with ThreadPoolExecutor(50) as ex:
    ex.map(lambda q: client.search(*q), iterable)
print('Search Query in Multi Threading Execution Time: ', time.time() - start)

Output: Search Query in Multi Threading Execution Time:  20.80817937850952s

start = time.time()
jobs = [Thread(target=client.search, args=arg) for arg in iterable]
# start threads
for job in jobs:
    job.start()
# join threads
for job in jobs:
    job.join()
print('Search Query in Standard Multi Threading Execution Time: ', time.time() - start)

Output: Search Query in Standard Multi Threading Execution Time:  21.506370544433594s

elasticsearch configuration -> elasticsearch.yml

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

node.master: true
node.data: true

http.host: 0.0.0.0
network.host: 0.0.0.0

script.painless.regex.enabled: true
http.max_initial_line_length: 10K

cloud:
  gce:
      project_id: myproj-es
      zone: europe-west1-b
discovery:
      zen.hosts_provider: gce
      zen.minimum_master_nodes: 2

I was trying to create multi threading and improve the search query performance but without effect, but I have a reason for that

ex. when i try multi threading i see that every new one request needs to wait the previous one finished and why? or how can i solve this problem? how can i configure elasticsearch so that new request don't wait previous one finished?

Python Multi Threading Examples

start = time.time()
jobs = [Thread(target=client.search, args=arg) for arg in iterable]
# start threads
for job in jobs:
    job.start()
# join threads
for job in jobs:
    job.join()
print('Search Query By Standard Multi Threading Execution Time: ', time.time() - start)

Output: Search Query By Standard Multi Threading Execution Time:  21.506370544433594s

start = time.time()
with ThreadPoolExecutor(50) as ex:
    ex.map(lambda q: client.search(*q), iterable)
print('Search Query By Multi Threading Execution Time: ', time.time() - start)

Output: Search Query By Multi Threading Execution Time:  20.80817937850952s

What is the solution? or How can I write an optimal option in this situation?

Is it possible to simultaneously run a lot of aggregation together? As shown in the above example

thanks in advance

Have you identified what is limiting performance in your Elasticsearch cluster? Is disk I/O saturated so you are seeing a lot of iowait? Is CPU saturated and constantly at 100% when you are running the query?

Hello Christian, thanks for answer
i have two clusters, each has 4Core and 16GB RAM and each one has two nodes, in total 4 nodes
i checked cpu load, see the pictures below, when i running the query with multi threading
ex.

with ThreadPoolExecutor(50) as ex:
     ex.map(lambda q: client.search(*q), iterable)

cluster 1

cluster 2

but the RAM is enough and why? or i have wrong jvm.options configuration

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms8g
-Xmx8g

is it right?

Please do not paste images of text as it is very hard to read. From what I can see it looks like all 4 CPU cores are saturated, although it is hard to see how much of this is iowait. What does iostat give?

Okay Christian, thanks once again :slight_smile:

cluster 1 stats, command /_nodes/stats/_all?pretty

 "fs" : {
  "timestamp" : 1535446522546,
  "total" : {
    "total_in_bytes" : 103880232960,
    "free_in_bytes" : 88270123008,
    "available_in_bytes" : 88253345792
  },
  "least_usage_estimate" : {
    "path" : "/var/lib/elasticsearch/nodes/0",
    "total_in_bytes" : 103880232960,
    "available_in_bytes" : 88253349888,
    "used_disk_percent" : 15.04317291819828
  },
  "most_usage_estimate" : {
    "path" : "/var/lib/elasticsearch/nodes/0",
    "total_in_bytes" : 103880232960,
    "available_in_bytes" : 88253349888,
    "used_disk_percent" : 15.04317291819828
  },
  "data" : [
    {
      "path" : "/var/lib/elasticsearch/nodes/0",
      "mount" : "/ (/dev/sda1)",
      "type" : "ext4",
      "total_in_bytes" : 103880232960,
      "free_in_bytes" : 88270123008,
      "available_in_bytes" : 88253345792
    }
  ],
  "io_stats" : {
    "devices" : [
      {
        "device_name" : "sda1",
        "operations" : 316674,
        "read_operations" : 155410,
        "write_operations" : 161264,
        "read_kilobytes" : 4882556,
        "write_kilobytes" : 3050428
      }
    ],
    "total" : {
      "operations" : 316674,
      "read_operations" : 155410,
      "write_operations" : 161264,
      "read_kilobytes" : 4882556,
      "write_kilobytes" : 3050428
    }
  }
},


"fs" : {
  "timestamp" : 1535446521380,
  "total" : {
    "total_in_bytes" : 103880232960,
    "free_in_bytes" : 88530046976,
    "available_in_bytes" : 88513269760
  },
  "data" : [
    {
      "path" : "/var/lib/elasticsearch/nodes/0",
      "mount" : "/ (/dev/sda1)",
      "type" : "ext4",
      "total_in_bytes" : 103880232960,
      "free_in_bytes" : 88530046976,
      "available_in_bytes" : 88513269760
    }
  ],
  "io_stats" : {
    "devices" : [
      {
        "device_name" : "sda1",
        "operations" : 212440,
        "read_operations" : 151995,
        "write_operations" : 60445,
        "read_kilobytes" : 4926748,
        "write_kilobytes" : 822724
      }
    ],
    "total" : {
      "operations" : 212440,
      "read_operations" : 151995,
      "write_operations" : 60445,
      "read_kilobytes" : 4926748,
      "write_kilobytes" : 822724
    }
  }
},

cluster 2 stats, command /_nodes/stats/_all?pretty

"fs" : {
  "timestamp" : 1535447090560,
  "total" : {
    "total_in_bytes" : 103880232960,
    "free_in_bytes" : 88270102528,
    "available_in_bytes" : 88253325312
  },
  "least_usage_estimate" : {
    "path" : "/var/lib/elasticsearch/nodes/0",
    "total_in_bytes" : 103880232960,
    "available_in_bytes" : 88253333504,
    "used_disk_percent" : 15.043188690207572
  },
  "most_usage_estimate" : {
    "path" : "/var/lib/elasticsearch/nodes/0",
    "total_in_bytes" : 103880232960,
    "available_in_bytes" : 88253333504,
    "used_disk_percent" : 15.043188690207572
  },
  "data" : [
    {
      "path" : "/var/lib/elasticsearch/nodes/0",
      "mount" : "/ (/dev/sda1)",
      "type" : "ext4",
      "total_in_bytes" : 103880232960,
      "free_in_bytes" : 88270102528,
      "available_in_bytes" : 88253325312
    }
  ],
  "io_stats" : {
    "devices" : [
      {
        "device_name" : "sda1",
        "operations" : 317064,
        "read_operations" : 155411,
        "write_operations" : 161653,
        "read_kilobytes" : 4882560,
        "write_kilobytes" : 3052984
      }
    ],
    "total" : {
      "operations" : 317064,
      "read_operations" : 155411,
      "write_operations" : 161653,
      "read_kilobytes" : 4882560,
      "write_kilobytes" : 3052984
    }
  }
},


"fs" : {
  "timestamp" : 1535447089779,
  "total" : {
    "total_in_bytes" : 103880232960,
    "free_in_bytes" : 88530034688,
    "available_in_bytes" : 88513257472
  },
  "data" : [
    {
      "path" : "/var/lib/elasticsearch/nodes/0",
      "mount" : "/ (/dev/sda1)",
      "type" : "ext4",
      "total_in_bytes" : 103880232960,
      "free_in_bytes" : 88530034688,
      "available_in_bytes" : 88513257472
    }
  ],
  "io_stats" : {
    "devices" : [
      {
        "device_name" : "sda1",
        "operations" : 212693,
        "read_operations" : 152039,
        "write_operations" : 60654,
        "read_kilobytes" : 4927296,
        "write_kilobytes" : 824376
      }
    ],
    "total" : {
      "operations" : 212693,
      "read_operations" : 152039,
      "write_operations" : 60654,
      "read_kilobytes" : 4927296,
      "write_kilobytes" : 824376
    }
  }
},

I am looking for the output of iostat, not Elasticsearch statistics.

sorry, now i got the results

I used the commands:
$ iostat
$ iostat -d 2 10
$ iostat -x hda hdb 2 10
$ iostat -p sda 2 10

Disks look largely idle, so it seems your bottleneck is CPU. I would therefore recommend scaling up or out the cluster.

I'm sorry, the previous results were useless, because there were no query running
I did update the results, please see it results

That looks like a lot of iowait. What type of storage/disks are you using?

It looks like the disk type is HDD, and what is the solution?

# SSD - 0
# HDD - 1
root@es-group-22xc:/home/gogua# cat /sys/block/sda/queue/rotational
1

# where ROTA means rotational device (1 if true, 0 if false)
root@es-group-22xc:/home/gogua# lsblk -d -o name,rota
NAME  ROTA
loop0    1
loop1    1
loop3    1
loop4    1
loop5    1
sda      1

root@es-group-22xc:/home/gogua# lshw -short -C disk
H/W path      Device      Class      Description
================================================
/0/1/0.1.0    /dev/sda    disk       107GB PersistentDisk

root@logmind-es-group-22xc:/home/gogua# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: Google   Model: PersistentDisk   Rev: 1   
  Type:   Direct-Access                    ANSI  SCSI revision: 06

root@es-group-22xc:/home/gogua# hdparm -I /dev/sda

/dev/sda:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ATA device, with non-removable media
Standards:
	Likely used: 1
Configuration:
	Logical		max	current
	cylinders	0	0
	heads		0	0
	sectors/track	0	0
	--
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:           0 MBytes
	device size with M = 1000*1000:           0 MBytes 
	cache/buffer size  = unknown
Capabilities:
	IORDY not likely
	Cannot perform double-word IO
	R/W multiple sector transfer: not supported
	DMA: not supported
	PIO: pio0 

root@es-group-22xc:/home/gogua# lshw -class disk -class storage
  *-scsi                    
       physical id: 1
       logical name: scsi0
     *-disk
          description: SCSI Disk
          product: PersistentDisk
          vendor: Google
          physical id: 0.1.0
          bus info: scsi@0:0.1.0
          logical name: /dev/sda
          version: 1
          size: 100GiB (107GB)
          capabilities: gpt-1.00 partitioned partitioned:gpt
          configuration: ansiversion=6 guid=76f45aa5-496c-4018-bb70-e5450640be1f logicalsectorsize=512 sectorsize=4096

root@es-group-22xc:/home/gogua# time for i in `seq 1 1000`; do     dd bs=4k if=/dev/sda count=1 skip=$(( $RANDOM * 128 )) >/dev/null 2>&1; done

real	0m10.720s
user	0m1.218s
sys	0m0.720s

Upgrade to SSD or scale out to more spinning disks and/or nodes?

Thank you very much for your help

Hello Christian,

I upgraded HDD to SSD, but i have the same problem, every new one request needs to wait the previous one finished and does't work multithreading

can you see iostat results

What can I do to solve this problem?

I am not sure I understand what you mean by this. Can you explain further? What does CPU usage look like now that you have upgraded the storage?