Acceptable Search Performance

I am using elasticsearch for indexing and searching around 6 million HTML
documents. In order to do that I created an index with 10 shards. Earlier,
I was running just one node on my physical machine (details below). I
allocated 9 GB heap size to Elastic Search. I am performing just simple
search queries (nothing fancy). You can see a typical search query herehttps://gist.github.com/devashishtyagi/5909747.
The average response size from elasticsearch is ~80 KB. In order to test
the performance of elasticsearch, I created an Apache Jmeter test. The test
would read random words from a search term file I supplied and fetch back
the response from the elasticsearch. The Jmeter tests were performed from a
separate machine but located close by (so not much of network overhead).
This is the result I got

  1. 10 threads, 50 requests per thread - 2.3 QPS and average response
    time of > 4 sec.
  2. *5 threads, 100 request per thread *- 3.4 QPS and average response
    time of > 1sec

Here are some of my index statistics

  • Number of shards - 10
  • Number of documents - 5174688
  • Size of index - 56 GB
  • Size of a typical shard - 5.5 GB
  • Number of replicas - 0

My machine configuration

Amazon EC m3.xlarge

  • RAM - 15 GB
  • Compute Units - 13
  • Hard Drive - 1 TB EBS Drive

I went through several search performance related mails on the group and it
feels like that I am getting subpar performance. Or is it an acceptable
search performance ?

During my tests I found out that elasticsearch was getting bottle necked on
disk I/O. So I added 3 more EBS drives to the same machine and started up 3
new elasticsearch nodes on same machine. So now I had 4 elasticsearch nodes
running on the same server. Here the performance test results with this
configuration

  1. 10 threads, 50 requests per thread - 11.6 QPS and average response
    time of ~ 842 ms.
  2. *7 threads, 100 request per thread *- 13.4 QPS and average response
    time of ~ 512 ms.
  3. 8 threads, 100 requests per thread - 14.3 QPS and average response
    time of ~ 550 ms.

Although this seems like a huge improvement but with 4 drives too
elasticsearch is getting bottle necked on Disk I/O. Is this is expected ?

P.S. I have come across various posts where it is mentioned that routing
greatly improves performance but I have no idea how to use that in my use
case.

Thanks in advance,
Devashish Tyagi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

1 Like

Quick analysis

  • you have highlighting on. This feature tends to work heavily on disk
    (doc fetching)
  • you have size =100. This is an extreme setting. It creates additional
    burden for highlighting.
  • to optimize highlighting, there are options that you do not use yet
    (hint: term vector, fast vector highlighter)
    http://www.elasticsearch.org/guide/reference/api/search/highlighting/
  • do not ramp up more than one node per machine, there is not much sense
    in it
  • EBS drives are known to be slow (they go over 1Gbit network channels),
    ES is built to scale over machines, not only number of drives, so use
    more machines
  • and, finally, use "query" instead of "filtered query" in the query
    unless you know what you want to test (you simply thrash your very large
    filter cache when load testing, which is bad for overall performance)

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.