Filters and facets and tuning

Hi there! We are trying to tune our ES server, we are having many slow
queries (from 200ms up to 4s) now that we launched it on real world usage.

We have a small index (80gb) 25M docs (we store the source, that is pretty
big) and we have 8 document types.

We are using a default 5 shards, and we have 2 replicas (3 servers right
now) 64GB Intel Xeon 3 Ghz with 24 cores each.

Thanks to bigdesk :smiley: we are monitoring, and CPU usage is minimal, less than
5% on average, we are having around 5-6 QPS on each node.

I'm starting to scratch the surface of tuning ES, still a long road ahead
but one thing is that I would like to use filters more often when executing
faceting navigation.

So far our approach had been to append the selected facet to the query.

This works well, but I think (please correct me if I'm wrong) that the
preferred approach would be using facet_filters instead.

I noticed that if I append the "filter" to the query, the hits gets
narrowed down. But the facets counting do not reflect the new query
results, they are still counters for the first match_all query.

I also noticed that to get the new counters I need to add the proper
facet_filter to each facet, is this right? The main problem is that we have
around 20 facets, does that means that I have to keep stacking
facet_filters for each facet based on the user interaction?

What is the best way to tackle this: Use filters + facet_filters or just
add more terms to the query to narrow it down?

BTW this is a sample slow query we have:

{

"from": 2784,
"size": 96,
"query": {
"bool": {
"must_not": {
"query_string": {
"query": "published:F OR active:F"
}
}
}
},
"sort": [
{
"albumDownloadsMonth": {
"order": "desc"
}
}
],
"facets": {
"genre": {
"terms": {
"field": "genreName",
"size": 999,
"all_terms": true
}
},
"compilation": {
"terms": {
"field": "compilation",
"size": 999,
"all_terms": true
}
},
"singleEp": {
"terms": {
"field": "singleEp",
"size": 999,
"all_terms": true
}
},
"editorsPick": {
"terms": {
"field": "editorsPick",
"size": 999,
"all_terms": true
}
},
"style": {
"terms": {
"field": "styleName",
"size": 999,
"all_terms": true
}
},
"explicit": {
"terms": {
"field": "explicit",
"size": 999,
"all_terms": true
}
},
"alpha": {
"terms": {
"field": "nameLetter",
"size": 999,
"all_terms": true
}
},
"freeTracks": {
"terms": {
"field": "freeTracks",
"size": 999,
"all_terms": true
}
},
"live": {
"terms": {
"field": "live",
"size": 999,
"all_terms": true
}
},
"new": {
"range": {
"field": "releaseDate",
"ranges": [
{
"from": "1353165612946"
},
{
"from": "1352906412946"
},
{
"from": "1355214180242"
},
{
"from": "1352039984018"
}
]
}
},
"decade": {
"range": {
"field": "releaseDate",
"ranges": [
{
"from": "-2208970740000",
"to": "-1893437940000"
},
{
"from": "-1893437940000",
"to": "-1577905140000"
},
{
"from": "-1577905140000",
"to": "-1262285940000"
},
{
"from": "-1262285940000",
"to": "-946753140000"
},
{
"from": "-946753140000",
"to": "-631133940000"
},
{
"from": "-631133940000",
"to": "-315601140000"
},
{
"from": "-315601140000",
"to": "18060000"
},
{
"from": "18060000",
"to": "315550860000"
},
{
"from": "315550860000",
"to": "631170060000"
},
{
"from": "631170060000",
"to": "946702860000"
},
{
"from": "946702860000",
"to": "1262322060000"
},
{
"from": "1262322060000",
"to": "1577854860000"
}
]
}
},
"multiDisc": {
"range": {
"field": "numDiscs",
"ranges": [
{
"from": "2"
}
]
}
},
"rated": {
"range": {
"field": "rating",
"ranges": [
{
"from": "3.50"
}
]
}
},
"advance": {
"range": {
"field": "releaseDate",
"ranges": [
{
"from": "1353511212946"
}
]
}
}
}
}

Best regards

--

Answers inline.

HTH

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 nov. 2012 à 17:48, Vinicius Carvalho viniciusccarvalho@gmail.com a écrit :

Hi there! We are trying to tune our ES server, we are having many slow queries (from 200ms up to 4s) now that we launched it on real world usage.

We have a small index (80gb) 25M docs (we store the source, that is pretty big) and we have 8 document types.

We are using a default 5 shards, and we have 2 replicas (3 servers right now) 64GB Intel Xeon 3 Ghz with 24 cores each.

Thanks to bigdesk :smiley: we are monitoring, and CPU usage is minimal, less than 5% on average, we are having around 5-6 QPS on each node.

I'm starting to scratch the surface of tuning ES, still a long road ahead but one thing is that I would like to use filters more often when executing faceting navigation.

So far our approach had been to append the selected facet to the query.

This works well, but I think (please correct me if I'm wrong) that the preferred approach would be using facet_filters instead.

I noticed that if I append the "filter" to the query, the hits gets narrowed down. But the facets counting do not reflect the new query results, they are still counters for the first match_all query.

True

I also noticed that to get the new counters I need to add the proper facet_filter to each facet, is this right? The main problem is that we have around 20 facets, does that means that I have to keep stacking facet_filters for each facet based on the user interaction?

Yes

What is the best way to tackle this: Use filters + facet_filters or just add more terms to the query to narrow it down?

Add your filters to the facets. Filters reduce the dataset before executing the query. They are more efficients.

BTW this is a sample slow query we have:

(skipped)

--