Many slow transactions at index_search_slow_log_file


(agodoy) #1

Hello, I'm doing searches in elasticsearch and I see many with high times,
some close to 4 seconds.

Configuration:
-OS:
CPU vendor: Intel
CPU model: Core(TM)2 Duo CPU T7700 @ 2.40GHz (2267 MHz)
CPU total cores: 8
CPU sockets: 1 with 6 cores each
CPU cache: 4kb
-Mem:
Refresh interval: 1000ms
Total mem: 31.4gb (33807208448 b)
Total swap: 0b (0 b)
-JVM:
VM name: Java HotSpot(TM) 64-Bit Server VM
VM vendor: Sun Microsystems Inc.
VM version: 20.1-b02
Java version: 1.6.0_26

3 nodes cluster
2 nodes for indexing and 2 nodes for searchs (with transport and client
connections respectively)
1 index with default settings (5 shards 2 replicas)
60 active shards in the cluster
approximately 100 documents indexed per second (routing by user_id)
ES_MIN_MEM = ES_MIN_MEM = 26g

The slow transactions log file(index_search_slow_log_file) shows the
following:
...
[2012-08-28 16:39:10,941][WARN ][index.search.slowlog.fetch] [Mimic]
[items][5] took[1s], took_millis[1018], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"104802919"}},{"range":{"date_created":{"from":"2012-06-29T20:39:09.876Z","to":"2012-08-28T20:39:09.876Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:39:50,319][WARN ][index.search.slowlog.fetch] [Mimic]
[items][15] took[1s], took_millis[1037], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"68777696"}},{"range":{"date_created":{"from":"2012-06-29T20:39:48.986Z","to":"2012-08-28T20:39:48.986Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:39:55,922][WARN ][index.search.slowlog.fetch] [Mimic]
[items][8] took[1s], took_millis[1095], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"23852555"}},{"range":{"date_created":{"from":"2012-06-29T20:39:54.780Z","to":"2012-08-28T20:39:54.780Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:40:06,141][WARN ][index.search.slowlog.fetch] [Mimic]
[items][4] took[1.3s], took_millis[1371], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"52416364"}},{"range":{"date_created":{"from":"2012-06-29T20:40:04.729Z","to":"2012-08-28T20:40:04.729Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:40:17,428][WARN ][index.search.slowlog.fetch] [Mimic]
[items][1] took[1.2s], took_millis[1241], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"32406782"}},{"range":{"date_created":{"from":"2012-06-29T20:40:16.072Z","to":"2012-08-28T20:40:16.072Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
....

I made an index optimization (run : curl -XPOST
'http://localhost:9200/items/_optimize?max_num_segments=3' ) this took about
3 hours,after that the search improved
but after a few hours occur again.

I attached pictures displayed by bigdesk. Your help will be very useful,
thanks!

Ana

--


(Martijn Van Groningen) #2

I see that of the 31.4GB of ram that is available, 26GB of that is
allocated to the heap space of the ES process. The OS itself also
needs sufficient RAM (e.g for the filesystem cache). I'd suggest to
set the ES_HEAP_SIZE (sets both ES_MIN_MEM and ES_MAX_MEM) to
something like 18GB to 20GB and leave the rest of the RAM to the OS.
What I see from your big desk stats you're using around half of your
allocated heap space, so that value seems to be fine. I usually give
half of the RAM to ES and leave the other half to the OS.

Are the queries also slow without the sort by date_created? If that is
not the case then the warmer api in the upcoming 0.20.0 version is
something to look into:

Martijn

On 28 August 2012 23:14, Ana G agodoy.ana@gmail.com wrote:

Hello, I'm doing searches in elasticsearch and I see many with high times,
some close to 4 seconds.

Configuration:
-OS:
CPU vendor: Intel
CPU model: Core(TM)2 Duo CPU T7700 @ 2.40GHz (2267 MHz)
CPU total cores: 8
CPU sockets: 1 with 6 cores each
CPU cache: 4kb
-Mem:
Refresh interval: 1000ms
Total mem: 31.4gb (33807208448 b)
Total swap: 0b (0 b)
-JVM:
VM name: Java HotSpot(TM) 64-Bit Server VM
VM vendor: Sun Microsystems Inc.
VM version: 20.1-b02
Java version: 1.6.0_26

3 nodes cluster
2 nodes for indexing and 2 nodes for searchs (with transport and client
connections respectively)
1 index with default settings (5 shards 2 replicas)
60 active shards in the cluster
approximately 100 documents indexed per second (routing by user_id)
ES_MIN_MEM = ES_MIN_MEM = 26g

The slow transactions log file(index_search_slow_log_file) shows the
following:
...
[2012-08-28 16:39:10,941][WARN ][index.search.slowlog.fetch] [Mimic]
[items][5] took[1s], took_millis[1018], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"104802919"}},{"range":{"date_created":{"from":"2012-06-29T20:39:09.876Z","to":"2012-08-28T20:39:09.876Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:39:50,319][WARN ][index.search.slowlog.fetch] [Mimic]
[items][15] took[1s], took_millis[1037], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"68777696"}},{"range":{"date_created":{"from":"2012-06-29T20:39:48.986Z","to":"2012-08-28T20:39:48.986Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:39:55,922][WARN ][index.search.slowlog.fetch] [Mimic]
[items][8] took[1s], took_millis[1095], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"23852555"}},{"range":{"date_created":{"from":"2012-06-29T20:39:54.780Z","to":"2012-08-28T20:39:54.780Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:40:06,141][WARN ][index.search.slowlog.fetch] [Mimic]
[items][4] took[1.3s], took_millis[1371], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"52416364"}},{"range":{"date_created":{"from":"2012-06-29T20:40:04.729Z","to":"2012-08-28T20:40:04.729Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
[2012-08-28 16:40:17,428][WARN ][index.search.slowlog.fetch] [Mimic]
[items][1] took[1.2s], took_millis[1241], search_type[QUERY_AND_FETCH],
total_shards[1],
source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"32406782"}},{"range":{"date_created":{"from":"2012-06-29T20:40:16.072Z","to":"2012-08-28T20:40:16.072Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],
extra_source[],
....

I made an index optimization (run : curl -XPOST
'http://localhost:9200/items/_optimize?max_num_segments=3' ) this took about
3 hours,after that the search improved
but after a few hours occur again.

I attached pictures displayed by bigdesk. Your help will be very useful,
thanks!

Ana

--

--
Met vriendelijke groet,

Martijn van Groningen

--


(agodoy) #3

Thanks Martijn!
I made the change you suggested in terms of memory but did not improve the
problem :frowning: , some other point that can attack?
Without the sort the queries go fast, so increase memory allocations.
Another question, the query would be well formed? I have doubts about whether
to use the filter "and", can this be the problem?
Thanks again

2012/8/29 Martijn v Groningen martijn.v.groningen@gmail.com

I see that of the 31.4GB of ram that is available, 26GB of that is
allocated to the heap space of the ES process. The OS itself also
needs sufficient RAM (e.g for the filesystem cache). I'd suggest to
set the ES_HEAP_SIZE (sets both ES_MIN_MEM and ES_MAX_MEM) to
something like 18GB to 20GB and leave the rest of the RAM to the OS.
What I see from your big desk stats you're using around half of your
allocated heap space, so that value seems to be fine. I usually give
half of the RAM to ES and leave the other half to the OS.

Are the queries also slow without the sort by date_created? If that is
not the case then the warmer api in the upcoming 0.20.0 version is
something to look into:
https://github.com/elasticsearch/elasticsearch/issues/1913

Martijn

On 28 August 2012 23:14, Ana G agodoy.ana@gmail.com wrote:

Hello, I'm doing searches in elasticsearch and I see many with high
times,
some close to 4 seconds.

Configuration:
-OS:
CPU vendor: Intel
CPU model: Core(TM)2 Duo CPU T7700 @ 2.40GHz (2267 MHz)
CPU total cores: 8
CPU sockets: 1 with 6 cores each
CPU cache: 4kb
-Mem:
Refresh interval: 1000ms
Total mem: 31.4gb (33807208448 b)
Total swap: 0b (0 b)
-JVM:
VM name: Java HotSpot(TM) 64-Bit Server VM
VM vendor: Sun Microsystems Inc.
VM version: 20.1-b02
Java version: 1.6.0_26

3 nodes cluster
2 nodes for indexing and 2 nodes for searchs (with transport and client
connections respectively)
1 index with default settings (5 shards 2 replicas)
60 active shards in the cluster
approximately 100 documents indexed per second (routing by user_id)
ES_MIN_MEM = ES_MIN_MEM = 26g

The slow transactions log file(index_search_slow_log_file) shows the
following:
...
[2012-08-28 16:39:10,941][WARN ][index.search.slowlog.fetch] [Mimic]
[items][5] took[1s], took_millis[1018], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"104802919"}},{"range":{"date_created":{"from":"2012-06-29T20:39:09.876Z","to":"2012-08-28T20:39:09.876Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source[],
[2012-08-28 16:39:50,319][WARN ][index.search.slowlog.fetch] [Mimic]
[items][15] took[1s], took_millis[1037], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"68777696"}},{"range":{"date_created":{"from":"2012-06-29T20:39:48.986Z","to":"2012-08-28T20:39:48.986Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source[],
[2012-08-28 16:39:55,922][WARN ][index.search.slowlog.fetch] [Mimic]
[items][8] took[1s], took_millis[1095], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"23852555"}},{"range":{"date_created":{"from":"2012-06-29T20:39:54.780Z","to":"2012-08-28T20:39:54.780Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source[],
[2012-08-28 16:40:06,141][WARN ][index.search.slowlog.fetch] [Mimic]
[items][4] took[1.3s], took_millis[1371], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"52416364"}},{"range":{"date_created":{"from":"2012-06-29T20:40:04.729Z","to":"2012-08-28T20:40:04.729Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source[],
[2012-08-28 16:40:17,428][WARN ][index.search.slowlog.fetch] [Mimic]
[items][1] took[1.2s], took_millis[1241], search_type[QUERY_AND_FETCH],
total_shards[1],

source[{"from":0,"size":100,"query":{"constant_score":{"filter":{"and":{"filters":[{"term":{"seller_id":"32406782"}},{"range":{"date_created":{"from":"2012-06-29T20:40:16.072Z","to":"2012-08-28T20:40:16.072Z","include_lower":true,"include_upper":true}}}]}}}},"sort":[{"date_created":{"order":"desc"}}]}],

extra_source[],
....

I made an index optimization (run : curl -XPOST
'http://localhost:9200/items/_optimize?max_num_segments=3' ) this took
about
3 hours,after that the search improved
but after a few hours occur again.

I attached pictures displayed by bigdesk. Your help will be very useful,
thanks!

Ana

--

--
Met vriendelijke groet,

Martijn van Groningen

--

--


(Martijn Van Groningen) #4

On 29 August 2012 15:27, Ana G agodoy.ana@gmail.com wrote:

Thanks Martijn!
I made the change you suggested in terms of memory but did not improve the
problem :frowning: , some other point that can attack?
Ok, too bad. However I do think that the current memory balance is
better than how it was before.

Without the sort the queries go fast, so increase memory allocations.
The high search times seem to be related to the sorting. How many
times faster compared to previous measurements? What do you mean with
memory allocations?

Another question, the query would be well formed? I have doubts about
whether to use the filter "and", can this be the problem?
Let me think... What returns less documents in general the seller_id
filter or date_created range filter?

Looking at the filters regarding to caching:

  • The range filter seems to vary by a few ms from query to query,
    right? Is there a reason for this small change in time? Looks like the
    range is ~2 months. If you round the dates (for example to midnight)
    in the range, then the results will be fetched from cache most of the
    time. This should have a very nice impact on your search performance.
  • Is the combination between seller_id and date_created most of time
    unique or does the combination occur quite often?

Martijn

--


(agodoy) #5

some answers below

2012/8/29 Martijn v Groningen martijn.v.groningen@gmail.com

On 29 August 2012 15:27, Ana G agodoy.ana@gmail.com wrote:

Thanks Martijn!
I made the change you suggested in terms of memory but did not improve
the
problem :frowning: , some other point that can attack?
Ok, too bad. However I do think that the current memory balance is
better than how it was before.

Without the sort the queries go fast, so increase memory allocations.
The high search times seem to be related to the sorting. How many
times faster compared to previous measurements? What do you mean with
memory allocations?

I will take the time to take each

Another question, the query would be well formed? I have doubts about

whether to use the filter "and", can this be the problem?
Let me think... What returns less documents in general the seller_id
filter or date_created range filter?

the seller_id filter

Looking at the filters regarding to caching:

  • The range filter seems to vary by a few ms from query to query,
    right? Is there a reason for this small change in time? Looks like the
    range is ~2 months. If you round the dates (for example to midnight)
    in the range, then the results will be fetched from cache most of the
    time. This should have a very nice impact on your search performance.
  • Is the combination between seller_id and date_created most of time
    unique or does the combination occur quite often?

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )

Martijn

--

thank you very much!

--


(Martijn Van Groningen) #6

the seller_id filter
For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes available.

Martijn

--


(agodoy) #7

Thanks Martijn!
I made the change you suggested and greatly improved the times!
Slow queries appear every hour more or less. Is there any way to know if
some process is running at that time? maybe the merge process ...
I have set the default policy (tiered) and only change refresh_interval to
5s but apparently had no effect because in bigdesk continues to 1000ms.

Greetings!

2012/8/30 Martijn v Groningen martijn.v.groningen@gmail.com

the seller_id filter
For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id
and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes
available.

Martijn

--

--


(Martijn Van Groningen) #8

On 30 August 2012 14:52, Ana G agodoy.ana@gmail.com wrote:

Thanks Martijn!
I made the change you suggested and greatly improved the times!
Nice!

Slow queries appear every hour more or less. Is there any way to know if
some process is running at that time? maybe the merge process ...
This should tell if any merges are currently being performed:
localhost:9200/my_index/_stats/merge

Also since 0.19.9 there is a hot threads api:
http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-hot-threads.html
This tells you what threads are running inside your cluster.

I have set the default policy (tiered) and only change refresh_interval to
5s but apparently had no effect because in bigdesk continues to 1000ms.
Tiered merge policy is the default. I think this is related the to
fielddata cache
used when sorting by field. When new segment occurs that originate from merging
or just adding docs, the field data cache entry for your sort field /
segment combination
hasn't been loaded into memory. This happens during the first search
request after
the new segment has been made 'active', this can result in higher search times,
depending on how large the new segment is. The warming api can load
the the field
data cache for your sortfield + segment combination before the segment
is made 'active'.

Martijn

--


(Clinton Gormley) #9

Hi Martin

For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

It'd be good to know when changing an and/or filter to a bool filter
would be beneficial and when it wouldn't

Any chance of putting together a short explanation?

ta

clint

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes available.

Martijn

--


(Martijn Van Groningen) #10

From what I understand the main difference between an 'and' filter and
a 'bool' filter, is that the 'and' filter iterate over the documents
to match once. The first wrapped filter produces the documents to loop
over for the second wrapped filter and so on. The 'bool' filter works
differently, the wrapped filters are basically executed separately,
each filter result is added (bitwise and operation) to an internal
bitset and this bitset is finally omitted as result. Operations on
this internal bitset are efficient and fast.

In the case that a filter excludes a lot of documents and another
filter doesn't the 'and' filter is most likely the better filter use.
The filter that excludes many documents should be used as first
filter. This way the second filter then only need to try to match
documents that have matched with the first filter (actually it can
skip over all documents that didn't match with the first filter). In
the case filters don't exclude a lot of documents it is usually better
to use the 'bool' filter.

Some filters like the geo distance filter, do a computation per
document that it tries to match in order to determine if a document
matches with the filter. If the geo distance filter is used inside a
bool filter, it would compute the distance for all documents even for
documents that don't match with with other filters. In this case it is
most of the times better to use an 'and' filter and add the geo
distance filter as last filter.

Martijn

On 31 August 2012 15:59, Clinton Gormley clint@traveljury.com wrote:

Hi Martin

For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

It'd be good to know when changing an and/or filter to a bool filter
would be beneficial and when it wouldn't

Any chance of putting together a short explanation?

ta

clint

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes available.

Martijn

--

--
Met vriendelijke groet,

Martijn van Groningen

--


(Clinton Gormley) #11

Very clear and helpful explanation.

thanks

On Mon, 2012-09-03 at 09:53 +0200, Martijn v Groningen wrote:

From what I understand the main difference between an 'and' filter and
a 'bool' filter, is that the 'and' filter iterate over the documents
to match once. The first wrapped filter produces the documents to loop
over for the second wrapped filter and so on. The 'bool' filter works
differently, the wrapped filters are basically executed separately,
each filter result is added (bitwise and operation) to an internal
bitset and this bitset is finally omitted as result. Operations on
this internal bitset are efficient and fast.

In the case that a filter excludes a lot of documents and another
filter doesn't the 'and' filter is most likely the better filter use.
The filter that excludes many documents should be used as first
filter. This way the second filter then only need to try to match
documents that have matched with the first filter (actually it can
skip over all documents that didn't match with the first filter). In
the case filters don't exclude a lot of documents it is usually better
to use the 'bool' filter.

Some filters like the geo distance filter, do a computation per
document that it tries to match in order to determine if a document
matches with the filter. If the geo distance filter is used inside a
bool filter, it would compute the distance for all documents even for
documents that don't match with with other filters. In this case it is
most of the times better to use an 'and' filter and add the geo
distance filter as last filter.

Martijn

On 31 August 2012 15:59, Clinton Gormley clint@traveljury.com wrote:

Hi Martin

For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

It'd be good to know when changing an and/or filter to a bool filter
would be beneficial and when it wouldn't

Any chance of putting together a short explanation?

ta

clint

seems like a good option but would have to modify the service consumer of
our search, moreover, this is the use case we currently use (seller_id and
date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes available.

Martijn

--

--
Met vriendelijke groet,

Martijn van Groningen

--


(agodoy) #12

Hello all!
I still have these slow transactions approximately every two minutes, is
quite frustrating, I made the changes suggested by Martijn for caching
queries but no results.
On the other hand, although I do the upgrade index refresh interval to 5
seconds I still see the default (1000ms).
I appreciate any help you can give me! thanks!
Ana G

2012/9/3 Clinton Gormley clint@traveljury.com

Very clear and helpful explanation.

thanks

On Mon, 2012-09-03 at 09:53 +0200, Martijn v Groningen wrote:

From what I understand the main difference between an 'and' filter and
a 'bool' filter, is that the 'and' filter iterate over the documents
to match once. The first wrapped filter produces the documents to loop
over for the second wrapped filter and so on. The 'bool' filter works
differently, the wrapped filters are basically executed separately,
each filter result is added (bitwise and operation) to an internal
bitset and this bitset is finally omitted as result. Operations on
this internal bitset are efficient and fast.

In the case that a filter excludes a lot of documents and another
filter doesn't the 'and' filter is most likely the better filter use.
The filter that excludes many documents should be used as first
filter. This way the second filter then only need to try to match
documents that have matched with the first filter (actually it can
skip over all documents that didn't match with the first filter). In
the case filters don't exclude a lot of documents it is usually better
to use the 'bool' filter.

Some filters like the geo distance filter, do a computation per
document that it tries to match in order to determine if a document
matches with the filter. If the geo distance filter is used inside a
bool filter, it would compute the distance for all documents even for
documents that don't match with with other filters. In this case it is
most of the times better to use an 'and' filter and add the geo
distance filter as last filter.

Martijn

On 31 August 2012 15:59, Clinton Gormley clint@traveljury.com wrote:

Hi Martin

For an 'and' filter the order of inner filter seems fine here. What
might also improve the filter is if you change the 'and' filter into a
'bool' filter.

It'd be good to know when changing an and/or filter to a bool filter
would be beneficial and when it wouldn't

Any chance of putting together a short explanation?

ta

clint

seems like a good option but would have to modify the service
consumer of

our search, moreover, this is the use case we currently use
(seller_id and

date_created combination )
Fair enough. If you can round the dates then this will certainly
improve the search times.

Besides this, you also should try out the warmer api when it becomes
available.

Martijn

--

--
Met vriendelijke groet,

Martijn van Groningen

--

--


(system) #13