Many slow transactions at index_search_slow_log_file

agodoy · August 28, 2012, 9:14pm

Hello, I'm doing searches in elasticsearch and I see many with high times,
some close to 4 seconds.

Configuration:
-OS:
CPU vendor: Intel
CPU model: Core(TM)2 Duo CPU T7700 @ 2.40GHz (2267 MHz)
CPU total cores: 8
CPU sockets: 1 with 6 cores each
CPU cache: 4kb
-Mem:
Refresh interval: 1000ms
Total mem: 31.4gb (33807208448 b)
Total swap: 0b (0 b)
-JVM:
VM name: Java HotSpot(TM) 64-Bit Server VM
VM vendor: Sun Microsystems Inc.
VM version: 20.1-b02
Java version: 1.6.0_26

3 nodes cluster
2 nodes for indexing and 2 nodes for searchs (with transport and client
connections respectively)
1 index with default settings (5 shards 2 replicas)
60 active shards in the cluster
approximately 100 documents indexed per second (routing by user_id)
ES_MIN_MEM = ES_MIN_MEM = 26g

The slow transactions log file(index_search_slow_log_file) shows the
following:
...
[2012-08-28 16:39:10,941][WARN ][index.search.slowlog.fetch] [Mimic]
[items][5] took[1s], took_millis[1018], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"104802919"}},{"range":{"date_created":{"from":"2012-06-29T20:39:09.876Z","to":"2012-08-28T20:39:09.876Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:39:50,319][WARN ][index.search.slowlog.fetch] [Mimic]
[items][15] took[1s], took_millis[1037], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"68777696"}},{"range":{"date_created":{"from":"2012-06-29T20:39:48.986Z","to":"2012-08-28T20:39:48.986Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:39:55,922][WARN ][index.search.slowlog.fetch] [Mimic]
[items][8] took[1s], took_millis[1095], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"23852555"}},{"range":{"date_created":{"from":"2012-06-29T20:39:54.780Z","to":"2012-08-28T20:39:54.780Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:40:06,141][WARN ][index.search.slowlog.fetch] [Mimic]
[items][4] took[1.3s], took_millis[1371], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"52416364"}},{"range":{"date_created":{"from":"2012-06-29T20:40:04.729Z","to":"2012-08-28T20:40:04.729Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:40:17,428][WARN ][index.search.slowlog.fetch] [Mimic]
[items][1] took[1.2s], took_millis[1241], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"32406782"}},{"range":{"date_created":{"from":"2012-06-29T20:40:16.072Z","to":"2012-08-28T20:40:16.072Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
....

I made an index optimization (run : curl -XPOST
'http://localhost:9200/items/_optimize?max_num_segments=3' ) this took about
3 hours,after that the search improved
but after a few hours occur again.

I attached pictures displayed by bigdesk. Your help will be very useful,
thanks!

Ana

--

mvg · August 29, 2012, 8:27am

I see that of the 31.4GB of ram that is available, 26GB of that is
allocated to the heap space of the ES process. The OS itself also
needs sufficient RAM (e.g for the filesystem cache). I'd suggest to
set the ES_HEAP_SIZE (sets both ES_MIN_MEM and ES_MAX_MEM) to
something like 18GB to 20GB and leave the rest of the RAM to the OS.
What I see from your big desk stats you're using around half of your
allocated heap space, so that value seems to be fine. I usually give
half of the RAM to ES and leave the other half to the OS.

Are the queries also slow without the sort by date_created? If that is
not the case then the warmer api in the upcoming 0.20.0 version is
something to look into:

github.com/elastic/elasticsearch

Index Warmup API

opened 03:49PM - 06 May 12 UTC

closed 03:50PM - 06 May 12 UTC

kimchy

>feature v0.20.0.RC1

Index warming allows to run registered search requests to warm up the index befo…re it is available for search. With the near real time aspect of search, cold data (segments) will be warmed up before they become available for search. Warmup searches typically include requests that require heavy loading of data, such as faceting or sorting on specific fields. The warmup APIs allows to register warmup (search) under specific names, remove them, and get them. Index warmup can be disabled by setting `index.warmer.enabled` to `false`. It is supported as a realtime setting using update settings API. This can be handy when doing initial bulk indexing, disabling pre registered warmers to make indexing faster and less expensive and then enable it. ## Put Warmer Allows to put a warmup search request on a specific index (or indices), with the body composing of a regular search request. Types can be provided as part of the URI if the search request is designed to be run only against the specific types. Here is an example that registers a warmup called `warmer_1` against index `test` (can be alias or several indices), for a search request that runs against all types: ``` curl -XPUT localhost:9200/test/_warmer/warmer_1 -d '{ "query" : { "match_all" : {} }, "facets" : { "facet_1" : { "terms" : { "field" : "field" } } } }' ``` And an example that registers a warmup against specific types: ``` curl -XPUT localhost:9200/test/type1/_warmer/warmer_1 -d '{ "query" : { "match_all" : {} }, "facets" : { "facet_1" : { "terms" : { "field" : "field" } } } }' ``` ## Delete Warmer Removing a warmer can be done against an index (or alias / indices) based on its name. The provided name can be a simple wildcard expression or omitted to remove all warmers. Some samples: ``` # delete warmer named warmer_1 on test index curl -XDELETE localhost:9200/test/_warmer/warmer_1 # delete all warmers that start with warm on test index curl -XDELETE localhost:9200/test/_warmer/warm* # delete all warmers for test index curl -XDELETE localhost:9200/test/_warmer/ ``` ## GETting Warmer Getting a warmer for specific index (or alias, or several indices) based on its name. The provided name can be a simple wildcard expression or omitted to get all warmers. Some examples: ``` # get warmer named warmer_1 on test index curl -XGET localhost:9200/test/_warmer/warmer_1 # get all warmers that start with warm on test index curl -XGET localhost:9200/test/_warmer/warm* # get all warmers for test index curl -XGET localhost:9200/test/_warmer/ ```

Martijn

On 28 August 2012 23:14, Ana G agodoy.ana@gmail.com wrote:

Hello, I'm doing searches in elasticsearch and I see many with high times,
some close to 4 seconds.

Configuration:
-OS:
CPU vendor: Intel
CPU model: Core(TM)2 Duo CPU T7700 @ 2.40GHz (2267 MHz)
CPU total cores: 8
CPU sockets: 1 with 6 cores each
CPU cache: 4kb
-Mem:
Refresh interval: 1000ms
Total mem: 31.4gb (33807208448 b)
Total swap: 0b (0 b)
-JVM:
VM name: Java HotSpot(TM) 64-Bit Server VM
VM vendor: Sun Microsystems Inc.
VM version: 20.1-b02
Java version: 1.6.0_26

3 nodes cluster
2 nodes for indexing and 2 nodes for searchs (with transport and client
connections respectively)
1 index with default settings (5 shards 2 replicas)
60 active shards in the cluster
approximately 100 documents indexed per second (routing by user_id)
ES_MIN_MEM = ES_MIN_MEM = 26g

The slow transactions log file(index_search_slow_log_file) shows the
following:
...
[2012-08-28 16:39:10,941][WARN ][index.search.slowlog.fetch] [Mimic]
[items][5] took[1s], took_millis[1018], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"104802919"}},{"range":{"date_created":{"from":"2012-06-29T20:39:09.876Z","to":"2012-08-28T20:39:09.876Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source,
[2012-08-28 16:39:50,319][WARN ][index.search.slowlog.fetch] [Mimic]
[items][15] took[1s], took_millis[1037], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"68777696"}},{"range":{"date_created":{"from":"2012-06-29T20:39:48.986Z","to":"2012-08-28T20:39:48.986Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source,
[2012-08-28 16:39:55,922][WARN ][index.search.slowlog.fetch] [Mimic]
[items][8] took[1s], took_millis[1095], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"23852555"}},{"range":{"date_created":{"from":"2012-06-29T20:39:54.780Z","to":"2012-08-28T20:39:54.780Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source,
[2012-08-28 16:40:06,141][WARN ][index.search.slowlog.fetch] [Mimic]
[items][4] took[1.3s], took_millis[1371], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"52416364"}},{"range":{"date_created":{"from":"2012-06-29T20:40:04.729Z","to":"2012-08-28T20:40:04.729Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source,
[2012-08-28 16:40:17,428][WARN ][index.search.slowlog.fetch] [Mimic]
[items][1] took[1.2s], took_millis[1241], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"32406782"}},{"range":{"date_created":{"from":"2012-06-29T20:40:16.072Z","to":"2012-08-28T20:40:16.072Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source,
....

I made an index optimization (run : curl -XPOST
'http://localhost:9200/items/_optimize?max_num_segments=3' ) this took about
3 hours,after that the search improved
but after a few hours occur again.

I attached pictures displayed by bigdesk. Your help will be very useful,
thanks!

Ana

--

--
Met vriendelijke groet,

Martijn van Groningen

--

agodoy · August 29, 2012, 1:27pm

Thanks Martijn!
I made the change you suggested in terms of memory but did not improve the
problem , some other point that can attack?
Without the sort the queries go fast, so increase memory allocations.
Another question, the query would be well formed? I have doubts about whether
to use the filter "and", can this be the problem?
Thanks again

2012/8/29 Martijn v Groningen martijn.v.groningen@gmail.com

I see that of the 31.4GB of ram that is available, 26GB of that is
allocated to the heap space of the ES process. The OS itself also
needs sufficient RAM (e.g for the filesystem cache). I'd suggest to
set the ES_HEAP_SIZE (sets both ES_MIN_MEM and ES_MAX_MEM) to
something like 18GB to 20GB and leave the rest of the RAM to the OS.
What I see from your big desk stats you're using around half of your
allocated heap space, so that value seems to be fine. I usually give
half of the RAM to ES and leave the other half to the OS.

Are the queries also slow without the sort by date_created? If that is
not the case then the warmer api in the upcoming 0.20.0 version is
something to look into:
Index Warmup API · Issue #1913 · elastic/elasticsearch · GitHub

Martijn

On 28 August 2012 23:14, Ana G agodoy.ana@gmail.com wrote:

Hello, I'm doing searches in elasticsearch and I see many with high
times,
some close to 4 seconds.

Configuration:
-OS:
CPU vendor: Intel
CPU model: Core(TM)2 Duo CPU T7700 @ 2.40GHz (2267 MHz)
CPU total cores: 8
CPU sockets: 1 with 6 cores each
CPU cache: 4kb
-Mem:
Refresh interval: 1000ms
Total mem: 31.4gb (33807208448 b)
Total swap: 0b (0 b)
-JVM:
VM name: Java HotSpot(TM) 64-Bit Server VM
VM vendor: Sun Microsystems Inc.
VM version: 20.1-b02
Java version: 1.6.0_26

3 nodes cluster
2 nodes for indexing and 2 nodes for searchs (with transport and client
connections respectively)
1 index with default settings (5 shards 2 replicas)
60 active shards in the cluster
approximately 100 documents indexed per second (routing by user_id)
ES_MIN_MEM = ES_MIN_MEM = 26g

The slow transactions log file(index_search_slow_log_file) shows the
following:
...
[2012-08-28 16:39:10,941][WARN ][index.search.slowlog.fetch] [Mimic]
[items][5] took[1s], took_millis[1018], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"104802919"}},{"range":{"date_created":{"from":"2012-06-29T20:39:09.876Z","to":"2012-08-28T20:39:09.876Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source,
[2012-08-28 16:39:50,319][WARN ][index.search.slowlog.fetch] [Mimic]
[items][15] took[1s], took_millis[1037], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"68777696"}},{"range":{"date_created":{"from":"2012-06-29T20:39:48.986Z","to":"2012-08-28T20:39:48.986Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source,
[2012-08-28 16:39:55,922][WARN ][index.search.slowlog.fetch] [Mimic]
[items][8] took[1s], took_millis[1095], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"23852555"}},{"range":{"date_created":{"from":"2012-06-29T20:39:54.780Z","to":"2012-08-28T20:39:54.780Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source,
[2012-08-28 16:40:06,141][WARN ][index.search.slowlog.fetch] [Mimic]
[items][4] took[1.3s], took_millis[1371], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"52416364"}},{"range":{"date_created":{"from":"2012-06-29T20:40:04.729Z","to":"2012-08-28T20:40:04.729Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source,
[2012-08-28 16:40:17,428][WARN ][index.search.slowlog.fetch] [Mimic]
[items][1] took[1.2s], took_millis[1241], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"32406782"}},{"range":{"date_created":{"from":"2012-06-29T20:40:16.072Z","to":"2012-08-28T20:40:16.072Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source,
....

I made an index optimization (run : curl -XPOST
'http://localhost:9200/items/_optimize?max_num_segments=3' ) this took
about
3 hours,after that the search improved
but after a few hours occur again.

I attached pictures displayed by bigdesk. Your help will be very useful,
thanks!

Ana

--

--
Met vriendelijke groet,

Martijn van Groningen

--

--

mvg · August 29, 2012, 3:01pm

On 29 August 2012 15:27, Ana G agodoy.ana@gmail.com wrote:

Thanks Martijn!
I made the change you suggested in terms of memory but did not improve the
problem , some other point that can attack?
Ok, too bad. However I do think that the current memory balance is
better than how it was before.

Without the sort the queries go fast, so increase memory allocations.
The high search times seem to be related to the sorting. How many
times faster compared to previous measurements? What do you mean with
memory allocations?

Another question, the query would be well formed? I have doubts about
whether to use the filter "and", can this be the problem?
Let me think... What returns less documents in general the seller_id
filter or date_created range filter?

Looking at the filters regarding to caching:

The range filter seems to vary by a few ms from query to query,
right? Is there a reason for this small change in time? Looks like the
range is ~2 months. If you round the dates (for example to midnight)
in the range, then the results will be fetched from cache most of the
time. This should have a very nice impact on your search performance.
Is the combination between seller_id and date_created most of time
unique or does the combination occur quite often?

Martijn

--

agodoy · August 29, 2012, 3:40pm

some answers below

2012/8/29 Martijn v Groningen martijn.v.groningen@gmail.com

On 29 August 2012 15:27, Ana G agodoy.ana@gmail.com wrote:

Thanks Martijn!
I made the change you suggested in terms of memory but did not improve
the
problem , some other point that can attack?
Ok, too bad. However I do think that the current memory balance is
better than how it was before.

Without the sort the queries go fast, so increase memory allocations.
The high search times seem to be related to the sorting. How many
times faster compared to previous measurements? What do you mean with
memory allocations?

I will take the time to take each

Another question, the query would be well formed? I have doubts about

whether to use the filter "and", can this be the problem?
Let me think... What returns less documents in general the seller_id
filter or date_created range filter?

the seller_id filter

Looking at the filters regarding to caching:

The range filter seems to vary by a few ms from query to query,
right? Is there a reason for this small change in time? Looks like the
range is ~2 months. If you round the dates (for example to midnight)
in the range, then the results will be fetched from cache most of the
time. This should have a very nice impact on your search performance.

Is the combination between seller_id and date_created most of time
unique or does the combination occur quite often?

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )

Martijn

--

thank you very much!

--

mvg · August 30, 2012, 9:24am

the seller_id filter
For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes available.

Martijn

--

agodoy · August 30, 2012, 12:52pm

Thanks Martijn!
I made the change you suggested and greatly improved the times!
Slow queries appear every hour more or less. Is there any way to know if
some process is running at that time? maybe the merge process ...
I have set the default policy (tiered) and only change refresh_interval to
5s but apparently had no effect because in bigdesk continues to 1000ms.

Greetings!

2012/8/30 Martijn v Groningen martijn.v.groningen@gmail.com

the seller_id filter
For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id
and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes
available.

Martijn

--

--

mvg · August 30, 2012, 2:50pm

On 30 August 2012 14:52, Ana G agodoy.ana@gmail.com wrote:

Thanks Martijn!
I made the change you suggested and greatly improved the times!
Nice!

Slow queries appear every hour more or less. Is there any way to know if
some process is running at that time? maybe the merge process ...
This should tell if any merges are currently being performed:
localhost:9200/my_index/_stats/merge

Also since 0.19.9 there is a hot threads api:

This tells you what threads are running inside your cluster.

I have set the default policy (tiered) and only change refresh_interval to
5s but apparently had no effect because in bigdesk continues to 1000ms.
Tiered merge policy is the default. I think this is related the to
fielddata cache
used when sorting by field. When new segment occurs that originate from merging
or just adding docs, the field data cache entry for your sort field /
segment combination
hasn't been loaded into memory. This happens during the first search
request after
the new segment has been made 'active', this can result in higher search times,
depending on how large the new segment is. The warming api can load
the the field
data cache for your sortfield + segment combination before the segment
is made 'active'.

Martijn

--

Clinton_Gormley · August 31, 2012, 1:59pm

Hi Martin

For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

It'd be good to know when changing an and/or filter to a bool filter
would be beneficial and when it wouldn't

Any chance of putting together a short explanation?

ta

clint

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes available.

Martijn

--

mvg · September 3, 2012, 7:53am

From what I understand the main difference between an 'and' filter and
a 'bool' filter, is that the 'and' filter iterate over the documents
to match once. The first wrapped filter produces the documents to loop
over for the second wrapped filter and so on. The 'bool' filter works
differently, the wrapped filters are basically executed separately,
each filter result is added (bitwise and operation) to an internal
bitset and this bitset is finally omitted as result. Operations on
this internal bitset are efficient and fast.

In the case that a filter excludes a lot of documents and another
filter doesn't the 'and' filter is most likely the better filter use.
The filter that excludes many documents should be used as first
filter. This way the second filter then only need to try to match
documents that have matched with the first filter (actually it can
skip over all documents that didn't match with the first filter). In
the case filters don't exclude a lot of documents it is usually better
to use the 'bool' filter.

Some filters like the geo distance filter, do a computation per
document that it tries to match in order to determine if a document
matches with the filter. If the geo distance filter is used inside a
bool filter, it would compute the distance for all documents even for
documents that don't match with with other filters. In this case it is
most of the times better to use an 'and' filter and add the geo
distance filter as last filter.

Martijn

On 31 August 2012 15:59, Clinton Gormley clint@traveljury.com wrote:

Hi Martin

For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

It'd be good to know when changing an and/or filter to a bool filter
would be beneficial and when it wouldn't

Any chance of putting together a short explanation?

ta

clint

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes available.

Martijn

--

--
Met vriendelijke groet,

Martijn van Groningen

--

Clinton_Gormley · September 3, 2012, 10:32am

Very clear and helpful explanation.

thanks

On Mon, 2012-09-03 at 09:53 +0200, Martijn v Groningen wrote:

From what I understand the main difference between an 'and' filter and
a 'bool' filter, is that the 'and' filter iterate over the documents
to match once. The first wrapped filter produces the documents to loop
over for the second wrapped filter and so on. The 'bool' filter works
differently, the wrapped filters are basically executed separately,
each filter result is added (bitwise and operation) to an internal
bitset and this bitset is finally omitted as result. Operations on
this internal bitset are efficient and fast.

In the case that a filter excludes a lot of documents and another
filter doesn't the 'and' filter is most likely the better filter use.
The filter that excludes many documents should be used as first
filter. This way the second filter then only need to try to match
documents that have matched with the first filter (actually it can
skip over all documents that didn't match with the first filter). In
the case filters don't exclude a lot of documents it is usually better
to use the 'bool' filter.

Some filters like the geo distance filter, do a computation per
document that it tries to match in order to determine if a document
matches with the filter. If the geo distance filter is used inside a
bool filter, it would compute the distance for all documents even for
documents that don't match with with other filters. In this case it is
most of the times better to use an 'and' filter and add the geo
distance filter as last filter.

Martijn

On 31 August 2012 15:59, Clinton Gormley clint@traveljury.com wrote:

Hi Martin

For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

It'd be good to know when changing an and/or filter to a bool filter
would be beneficial and when it wouldn't

Any chance of putting together a short explanation?

ta

clint

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes available.

Martijn

--

--
Met vriendelijke groet,

Martijn van Groningen

--

agodoy · September 10, 2012, 2:31pm

Hello all!
I still have these slow transactions approximately every two minutes, is
quite frustrating, I made the changes suggested by Martijn for caching
queries but no results.
On the other hand, although I do the upgrade index refresh interval to 5
seconds I still see the default (1000ms).
I appreciate any help you can give me! thanks!
Ana G

2012/9/3 Clinton Gormley clint@traveljury.com

Very clear and helpful explanation.

thanks

On Mon, 2012-09-03 at 09:53 +0200, Martijn v Groningen wrote:

From what I understand the main difference between an 'and' filter and
a 'bool' filter, is that the 'and' filter iterate over the documents
to match once. The first wrapped filter produces the documents to loop
over for the second wrapped filter and so on. The 'bool' filter works
differently, the wrapped filters are basically executed separately,
each filter result is added (bitwise and operation) to an internal
bitset and this bitset is finally omitted as result. Operations on
this internal bitset are efficient and fast.

In the case that a filter excludes a lot of documents and another
filter doesn't the 'and' filter is most likely the better filter use.
The filter that excludes many documents should be used as first
filter. This way the second filter then only need to try to match
documents that have matched with the first filter (actually it can
skip over all documents that didn't match with the first filter). In
the case filters don't exclude a lot of documents it is usually better
to use the 'bool' filter.

Some filters like the geo distance filter, do a computation per
document that it tries to match in order to determine if a document
matches with the filter. If the geo distance filter is used inside a
bool filter, it would compute the distance for all documents even for
documents that don't match with with other filters. In this case it is
most of the times better to use an 'and' filter and add the geo
distance filter as last filter.

Martijn

On 31 August 2012 15:59, Clinton Gormley clint@traveljury.com wrote:

Hi Martin

For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

It'd be good to know when changing an and/or filter to a bool filter
would be beneficial and when it wouldn't

Any chance of putting together a short explanation?

ta

clint

seems like a good option but would have to modify the service
consumer of
our search, moreover, this is the use case we currently use
(seller_id and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes
available.

Martijn

--

--
Met vriendelijke groet,

Martijn van Groningen

--

--