Elastic search vs Solr vs Sensei

I've implemented the project to compare the performance between
Elastic Search, Sensei and Solr

I was testing the elastic search version 0.18.7. I've configured it to
use only one 1 shard. 2.5 mln documents were put into the index, after
that I've launched the indexing process to add other 500k docs. At the
same time I've launched the concurrent client, that issued the
following queries
{"facets":{"tags":{"terms":{"field":"tags","size":300}},"color":
{"terms":{"field":"color","size":300}}},
"filter":{"or":{"filters":
[{"and":{"filters":[{"or":{"filters":[{"term":{"color":"red"}},
{"term":{"color":"green"}},{"term":{"color":"blue"}},{"term":
{"color":"yellow"}}]}},{"not":{"filter":{"or":{"filters":[{"term":
{"color":"gold"}},{"term":{"color":"black"}},{"term:
{"color":"white"}}]}}}}]}},
{"and":{"filters":[{"or":{"filters":[{"term":{"tags":"expensive"}},
{"term":{"tags":"electric"}},{"term":{"tags":"hybrid"}},{"term":
{"tags":"soccer mom"}}]}},{"not":{"filter":{"or":{"filters":[{"term":
{"tags":"highend"}},{"term": {"tags":"navigation"}},{"term":
{"tags":"reliable"}}]}}}}]}},
{"range":{"price":
{"to":"9900","include_lower":true,"include_upper":true,"from":"6800"}}},
{"range":{"year":
{"to":"1998","include_lower":true,"include_upper":true,"from":"1997"}}},
{"prefix":{"makemodel":"european"}}]}}}
The query contains the high level "OR" query, consisting of 2 terms, 2
ranges and 1 prefix. It is designed to hit ~60-70% of all the docs

Here is the performance result:
#Threads min median mean
75% qps
1 939.09ms 1252.29ms 1224.64ms 1304.76ms 0.8
2 865.28ms 1096.68ms 1105.68ms 1231.89ms 1.8
3 742.27ms 1134.97ms 1117.09ms 1254.68ms 2.6
4 813.23ms 1172.67ms 1208.84ms 1376.10ms 3.2
5 939.43ms 1226.31ms 1218.19ms 1333.88ms 4.1
6 883.99ms 1331.10ms 1325.30ms 1501.60ms 4.2

If there is no indexing process on background
The result is as follows:
#Threads min median mean
75% qps
1 835.77ms 925.61ms 969.77ms 1090.47ms 1.0
2 871.71ms 1056.00ms 1102.22ms 1225.51ms 1.8
3 812.36ms 1094.02ms 1098.68ms 1179.90ms 2.7
4 849.71ms 1106.34ms 1173.51ms 1324.26ms 3.3
5 819.68ms 1115.03ms 1183.98ms 1393.15ms 4.2
6 971.61ms 1332.58ms 1346.59ms 1472.38ms 4.3
7 995.61ms 1377.42ms 1397.70ms 1565.51ms 4.8
8 1010.08ms 1359.56ms 1399.52ms 1610.80ms 5.3
9 1142.76ms 1675.27ms 1708.01ms 2040.13ms 4.8

I've got three questions so far:

  1. In case of background indexing we hit the concurrent bottleneck at
    only 4 querying threads, is it smth wrong with my setup?
  2. How can we tune the elastic search to get better results
  3. What's in your opinion is the preferred type of queries that I can
    use for the benchmark?

With many thanks,
Volodymyr

BTW here is the spec of my machine
RedHat 6.1
Intel XEON e5620 @2.40 GHz, 8 cores
63 GB RAM

Do you have replica set? What do you mean by "background indexing" and how
do you define "bottleneck"? Without replica, indexing will impact search
performance. Try with replica so that indexing and search can be
distributed between different Lucene indexes.

Jörg

On Friday, April 27, 2012 2:43:05 AM UTC+2, Volodymyr Zhabiuk wrote:

I've implemented the project to compare the performance between
Elastic Search, Sensei and Solr
GitHub - vzhabiuk/search-perf

I was testing the Elasticsearch version 0.18.7. I've configured it to
use only one 1 shard. 2.5 mln documents were put into the index, after
that I've launched the indexing process to add other 500k docs. At the
same time I've launched the concurrent client, that issued the
following queries
{"facets":{"tags":{"terms":{"field":"tags","size":300}},"color":
{"terms":{"field":"color","size":300}}},
"filter":{"or":{"filters":
[{"and":{"filters":[{"or":{"filters":[{"term":{"color":"red"}},
{"term":{"color":"green"}},{"term":{"color":"blue"}},{"term":
{"color":"yellow"}}]}},{"not":{"filter":{"or":{"filters":[{"term":
{"color":"gold"}},{"term":{"color":"black"}},{"term:
{"color":"white"}}]}}}}]}},
{"and":{"filters":[{"or":{"filters":[{"term":{"tags":"expensive"}},
{"term":{"tags":"electric"}},{"term":{"tags":"hybrid"}},{"term":
{"tags":"soccer mom"}}]}},{"not":{"filter":{"or":{"filters":[{"term":
{"tags":"highend"}},{"term": {"tags":"navigation"}},{"term":
{"tags":"reliable"}}]}}}}]}},
{"range":{"price":
{"to":"9900","include_lower":true,"include_upper":true,"from":"6800"}}},
{"range":{"year":
{"to":"1998","include_lower":true,"include_upper":true,"from":"1997"}}},
{"prefix":{"makemodel":"european"}}]}}}
The query contains the high level "OR" query, consisting of 2 terms, 2
ranges and 1 prefix. It is designed to hit ~60-70% of all the docs

Here is the performance result:
#Threads min median mean
75% qps
1 939.09ms 1252.29ms 1224.64ms 1304.76ms 0.8
2 865.28ms 1096.68ms 1105.68ms 1231.89ms 1.8
3 742.27ms 1134.97ms 1117.09ms 1254.68ms 2.6
4 813.23ms 1172.67ms 1208.84ms 1376.10ms 3.2
5 939.43ms 1226.31ms 1218.19ms 1333.88ms 4.1
6 883.99ms 1331.10ms 1325.30ms 1501.60ms 4.2

If there is no indexing process on background
The result is as follows:
#Threads min median mean
75% qps
1 835.77ms 925.61ms 969.77ms 1090.47ms 1.0
2 871.71ms 1056.00ms 1102.22ms 1225.51ms 1.8
3 812.36ms 1094.02ms 1098.68ms 1179.90ms 2.7
4 849.71ms 1106.34ms 1173.51ms 1324.26ms 3.3
5 819.68ms 1115.03ms 1183.98ms 1393.15ms 4.2
6 971.61ms 1332.58ms 1346.59ms 1472.38ms 4.3
7 995.61ms 1377.42ms 1397.70ms 1565.51ms 4.8
8 1010.08ms 1359.56ms 1399.52ms 1610.80ms 5.3
9 1142.76ms 1675.27ms 1708.01ms 2040.13ms 4.8

I've got three questions so far:

  1. In case of background indexing we hit the concurrent bottleneck at
    only 4 querying threads, is it smth wrong with my setup?
  2. How can we tune the Elasticsearch to get better results
  3. What's in your opinion is the preferred type of queries that I can
    use for the benchmark?

With many thanks,
Volodymyr

BTW here is the spec of my machine
RedHat 6.1
Intel XEON e5620 @2.40 GHz, 8 cores
63 GB RAM

I'll have to have a look at the code and see...

On Fri, Apr 27, 2012 at 3:43 AM, Volodymyr Zhabiuk vzhabiuk@gmail.comwrote:

I've implemented the project to compare the performance between
Elastic Search, Sensei and Solr
GitHub - vzhabiuk/search-perf

I was testing the Elasticsearch version 0.18.7. I've configured it to
use only one 1 shard. 2.5 mln documents were put into the index, after
that I've launched the indexing process to add other 500k docs. At the
same time I've launched the concurrent client, that issued the
following queries
{"facets":{"tags":{"terms":{"field":"tags","size":300}},"color":
{"terms":{"field":"color","size":300}}},
"filter":{"or":{"filters":
[{"and":{"filters":[{"or":{"filters":[{"term":{"color":"red"}},
{"term":{"color":"green"}},{"term":{"color":"blue"}},{"term":
{"color":"yellow"}}]}},{"not":{"filter":{"or":{"filters":[{"term":
{"color":"gold"}},{"term":{"color":"black"}},{"term:
{"color":"white"}}]}}}}]}},
{"and":{"filters":[{"or":{"filters":[{"term":{"tags":"expensive"}},
{"term":{"tags":"electric"}},{"term":{"tags":"hybrid"}},{"term":
{"tags":"soccer mom"}}]}},{"not":{"filter":{"or":{"filters":[{"term":
{"tags":"highend"}},{"term": {"tags":"navigation"}},{"term":
{"tags":"reliable"}}]}}}}]}},
{"range":{"price":
{"to":"9900","include_lower":true,"include_upper":true,"from":"6800"}}},
{"range":{"year":
{"to":"1998","include_lower":true,"include_upper":true,"from":"1997"}}},
{"prefix":{"makemodel":"european"}}]}}}
The query contains the high level "OR" query, consisting of 2 terms, 2
ranges and 1 prefix. It is designed to hit ~60-70% of all the docs

Here is the performance result:
#Threads min median mean
75% qps
1 939.09ms 1252.29ms 1224.64ms 1304.76ms 0.8
2 865.28ms 1096.68ms 1105.68ms 1231.89ms 1.8
3 742.27ms 1134.97ms 1117.09ms 1254.68ms 2.6
4 813.23ms 1172.67ms 1208.84ms 1376.10ms 3.2
5 939.43ms 1226.31ms 1218.19ms 1333.88ms 4.1
6 883.99ms 1331.10ms 1325.30ms 1501.60ms 4.2

If there is no indexing process on background
The result is as follows:
#Threads min median mean
75% qps
1 835.77ms 925.61ms 969.77ms 1090.47ms 1.0
2 871.71ms 1056.00ms 1102.22ms 1225.51ms 1.8
3 812.36ms 1094.02ms 1098.68ms 1179.90ms 2.7
4 849.71ms 1106.34ms 1173.51ms 1324.26ms 3.3
5 819.68ms 1115.03ms 1183.98ms 1393.15ms 4.2
6 971.61ms 1332.58ms 1346.59ms 1472.38ms 4.3
7 995.61ms 1377.42ms 1397.70ms 1565.51ms 4.8
8 1010.08ms 1359.56ms 1399.52ms 1610.80ms 5.3
9 1142.76ms 1675.27ms 1708.01ms 2040.13ms 4.8

I've got three questions so far:

  1. In case of background indexing we hit the concurrent bottleneck at
    only 4 querying threads, is it smth wrong with my setup?
  2. How can we tune the Elasticsearch to get better results
  3. What's in your opinion is the preferred type of queries that I can
    use for the benchmark?

With many thanks,
Volodymyr

BTW here is the spec of my machine
RedHat 6.1
Intel XEON e5620 @2.40 GHz, 8 cores
63 GB RAM