Filter cache invalidation


(Gabe Gorelick-Feldman) #1

Does elasticsearch invalidate your filter caches automatically after a
write? The only documentation I've found on filter cache invalidation
(besides time-based expiry) is on
indices.cache.filter.terms.expire_after_write, which is disabled by
default. But what about other filters besides terms lookup? Should an
application that's concerned about consistency disable filter caching
altogether? Or do you have to explicitly clear every applicable filter
cache after a write?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c75c8e57-91ef-42f1-94df-abc08ee3ab12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #2

On Tue, May 13, 2014 at 2:24 PM, Gabe Gorelick-Feldman <
gabegorelick@gmail.com> wrote:

Does elasticsearch invalidate your filter caches automatically after a
write? The only documentation I've found on filter cache invalidation
(besides time-based expiry) is on
indices.cache.filter.terms.expire_after_write, which is disabled by
default. But what about other filters besides terms lookup? Should an
application that's concerned about consistency disable filter caching
altogether? Or do you have to explicitly clear every applicable filter
cache after a write?

Adding data "just works". I believe filter caches are per segment.
The way Lucene (therefor Elasticsearch) update
documents is by marking them deleted in their old segment and readding them
in a new segment. So the filter cache for the old segment can stay because
it'll just hit a tombstone for the document and skip it and the new segment
will be cached on the next query that includes the
filter.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0YrVW4%2BMZzWxE_gbBvicB4peFNOePor%3DgpvVhuHOqYVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Gabe Gorelick-Feldman) #3

Very cool, but what about terms lookup? The docs mention it uses an LRU
cache, is that separate from the segment? I ask because in my naive little
tests it seems that I'm getting stale results from my terms lookup filters,
even with indices.cache.filter.terms.expire_after_write set.

On Tuesday, May 13, 2014 2:35:26 PM UTC-4, Nikolas Everett wrote:

On Tue, May 13, 2014 at 2:24 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com <javascript:>> wrote:

Does elasticsearch invalidate your filter caches automatically after a
write? The only documentation I've found on filter cache invalidation
(besides time-based expiry) is on
indices.cache.filter.terms.expire_after_write, which is disabled by
default. But what about other filters besides terms lookup? Should an
application that's concerned about consistency disable filter caching
altogether? Or do you have to explicitly clear every applicable filter
cache after a write?

Adding data "just works". I believe filter caches are per segment.
The way Lucene (therefor Elasticsearch) update
documents is by marking them deleted in their old segment and readding them
in a new segment. So the filter cache for the old segment can stay because
it'll just hit a tombstone for the document and skip it and the new segment
will be cached on the next query that includes the
filter.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #4

You'd have to post an example for me to be sure what's up but I don't
expect terms filters to be different from other filters.

On Tue, May 13, 2014 at 3:45 PM, Gabe Gorelick-Feldman <
gabegorelick@gmail.com> wrote:

Very cool, but what about terms lookup? The docs mention it uses an LRU
cache, is that separate from the segment? I ask because in my naive little
tests it seems that I'm getting stale results from my terms lookup filters,
even with indices.cache.filter.terms.expire_after_write set.

On Tuesday, May 13, 2014 2:35:26 PM UTC-4, Nikolas Everett wrote:

On Tue, May 13, 2014 at 2:24 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com> wrote:

Does elasticsearch invalidate your filter caches automatically after a
write? The only documentation I've found on filter cache invalidation
(besides time-based expiry) is on indices.cache.filter.terms.expire_after_write,
which is disabled by default. But what about other filters besides terms
lookup? Should an application that's concerned about consistency disable
filter caching altogether? Or do you have to explicitly clear every
applicable filter cache after a write?

Adding data "just works". I believe filter caches are per segment.
The way Lucene (therefor Elasticsearch) update
documents is by marking them deleted in their old segment and readding them
in a new segment. So the filter cache for the old segment can stay because
it'll just hit a tombstone for the document and skip it and the new segment
will be cached on the next query that includes the
filter.

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2_ROH1WXfZzhr74vixOhU%3DWQtZAKvR2OGeByXxgg4S1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Gabe Gorelick-Feldman) #5

I double-checked, and it looks like the terms lookup filter is not
invalidated. Here are some steps to reproduce:

First, seed your data:

PUT /user/user/1
{
"roles": ["admin"]
}

POST /foo/bar
{
"a": 1,
"role": "admin"
}

This query will return 1 hit, as expected:

POST /foo/bar/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"role": {
"index": "user",
"type": "user",
"id": "1",
"path": "roles"
},
"_cache_key": "user_1_roles"
}
}
}
}
}

Then, update user 1's roles:

PUT /user/user/1
{
"roles": []
}

If you run the search again, you'd expect no results, but instead you get 1
hit. If you clear the cache:

POST /foo/_cache/clear?filter_keys=user_1_roles

and then run the search again, you get no results.

On Tuesday, May 13, 2014 4:08:47 PM UTC-4, Nikolas Everett wrote:

You'd have to post an example for me to be sure what's up but I don't
expect terms filters to be different from other filters.

On Tue, May 13, 2014 at 3:45 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com <javascript:>> wrote:

Very cool, but what about terms lookup? The docs mention it uses an LRU
cache, is that separate from the segment? I ask because in my naive little
tests it seems that I'm getting stale results from my terms lookup filters,
even with indices.cache.filter.terms.expire_after_write set.

On Tuesday, May 13, 2014 2:35:26 PM UTC-4, Nikolas Everett wrote:

On Tue, May 13, 2014 at 2:24 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com> wrote:

Does elasticsearch invalidate your filter caches automatically after a
write? The only documentation I've found on filter cache invalidation
(besides time-based expiry) is on indices.cache.filter.terms.expire_after_write,
which is disabled by default. But what about other filters besides terms
lookup? Should an application that's concerned about consistency disable
filter caching altogether? Or do you have to explicitly clear every
applicable filter cache after a write?

Adding data "just works". I believe filter caches are per segment.
The way Lucene (therefor Elasticsearch) update
documents is by marking them deleted in their old segment and readding them
in a new segment. So the filter cache for the old segment can stay because
it'll just hit a tombstone for the document and skip it and the new segment
will be cached on the next query that includes the
filter.

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e0c35840-0cfc-47db-8a86-6913ee38cc60%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #6

I modified your example a little bit, in this gist

but I find the filter cache is invalidated with the refresh after
overwriting an existing doc.

Maybe your example is confused because of the two indices you use?

Jörg

On Tue, May 13, 2014 at 11:00 PM, Gabe Gorelick-Feldman <
gabegorelick@gmail.com> wrote:

I double-checked, and it looks like the terms lookup filter is not
invalidated. Here are some steps to reproduce:

First, seed your data:

PUT /user/user/1
{
"roles": ["admin"]
}

POST /foo/bar
{
"a": 1,
"role": "admin"
}

This query will return 1 hit, as expected:

POST /foo/bar/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"role": {
"index": "user",
"type": "user",
"id": "1",
"path": "roles"
},
"_cache_key": "user_1_roles"
}
}
}
}
}

Then, update user 1's roles:

PUT /user/user/1
{
"roles": []
}

If you run the search again, you'd expect no results, but instead you get
1 hit. If you clear the cache:

POST /foo/_cache/clear?filter_keys=user_1_roles

and then run the search again, you get no results.

On Tuesday, May 13, 2014 4:08:47 PM UTC-4, Nikolas Everett wrote:

You'd have to post an example for me to be sure what's up but I don't
expect terms filters to be different from other filters.

On Tue, May 13, 2014 at 3:45 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com> wrote:

Very cool, but what about terms lookup? The docs mention it uses an LRU
cache, is that separate from the segment? I ask because in my naive little
tests it seems that I'm getting stale results from my terms lookup filters,
even with indices.cache.filter.terms.expire_after_write set.

On Tuesday, May 13, 2014 2:35:26 PM UTC-4, Nikolas Everett wrote:

On Tue, May 13, 2014 at 2:24 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com> wrote:

Does elasticsearch invalidate your filter caches automatically after a
write? The only documentation I've found on filter cache invalidation
(besides time-based expiry) is on indices.cache.filter.terms.expire_after_write,
which is disabled by default. But what about other filters besides terms
lookup? Should an application that's concerned about consistency disable
filter caching altogether? Or do you have to explicitly clear every
applicable filter cache after a write?

Adding data "just works". I believe filter caches are per segment.
The way Lucene (therefor Elasticsearch) update
documents is by marking them deleted in their old segment and readding them
in a new segment. So the filter cache for the old segment can stay because
it'll just hit a tombstone for the document and skip it and the new segment
will be cached on the next query that includes the
filter.

Nik

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e0c35840-0cfc-47db-8a86-6913ee38cc60%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e0c35840-0cfc-47db-8a86-6913ee38cc60%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGgiHivuZot7LyynvCER4%3DtONdVwWiW4tuctDs_f-PLgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Gabe Gorelick-Feldman) #7

It looks like there was some discussion about this last
year: https://github.com/elasticsearch/elasticsearch/issues/3219. The
consensus in that issue seems to be to disable terms lookup caching with
cache:false where you care about consistency. This is probably good
enough for me, although it's a shame updating the segment doesn't
invalidate the terms lookup cache. Is that worth filing an issue for?

On Tuesday, May 13, 2014 5:00:27 PM UTC-4, Gabe Gorelick-Feldman wrote:

I double-checked, and it looks like the terms lookup filter is not
invalidated. Here are some steps to reproduce:

First, seed your data:

PUT /user/user/1
{
"roles": ["admin"]
}

POST /foo/bar
{
"a": 1,
"role": "admin"
}

This query will return 1 hit, as expected:

POST /foo/bar/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"role": {
"index": "user",
"type": "user",
"id": "1",
"path": "roles"
},
"_cache_key": "user_1_roles"
}
}
}
}
}

Then, update user 1's roles:

PUT /user/user/1
{
"roles": []
}

If you run the search again, you'd expect no results, but instead you get
1 hit. If you clear the cache:

POST /foo/_cache/clear?filter_keys=user_1_roles

and then run the search again, you get no results.

On Tuesday, May 13, 2014 4:08:47 PM UTC-4, Nikolas Everett wrote:

You'd have to post an example for me to be sure what's up but I don't
expect terms filters to be different from other filters.

On Tue, May 13, 2014 at 3:45 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com> wrote:

Very cool, but what about terms lookup? The docs mention it uses an LRU
cache, is that separate from the segment? I ask because in my naive little
tests it seems that I'm getting stale results from my terms lookup filters,
even with indices.cache.filter.terms.expire_after_write set.

On Tuesday, May 13, 2014 2:35:26 PM UTC-4, Nikolas Everett wrote:

On Tue, May 13, 2014 at 2:24 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com> wrote:

Does elasticsearch invalidate your filter caches automatically after a
write? The only documentation I've found on filter cache invalidation
(besides time-based expiry) is on indices.cache.filter.terms.expire_after_write,
which is disabled by default. But what about other filters besides terms
lookup? Should an application that's concerned about consistency disable
filter caching altogether? Or do you have to explicitly clear every
applicable filter cache after a write?

Adding data "just works". I believe filter caches are per segment.
The way Lucene (therefor Elasticsearch) update
documents is by marking them deleted in their old segment and readding them
in a new segment. So the filter cache for the old segment can stay because
it'll just hit a tombstone for the document and skip it and the new segment
will be cached on the next query that includes the
filter.

Nik

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/257b43cb-3213-44c4-9e30-08ac31b2ff4f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Gabe Gorelick-Feldman) #8

@Jörg

It looks like you're using a plain terms filter. I'm trying to do a terms
lookup filter. This blog post does a good job describing terms lookup [1].

[1] http://www.elasticsearch.org/blog/terms-filter-lookup/

On Tuesday, May 13, 2014 5:56:19 PM UTC-4, Jörg Prante wrote:

I modified your example a little bit, in this gist

https://gist.github.com/jprante/042aaa910e47ebf4536b

but I find the filter cache is invalidated with the refresh after
overwriting an existing doc.

Maybe your example is confused because of the two indices you use?

Jörg

On Tue, May 13, 2014 at 11:00 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com <javascript:>> wrote:

I double-checked, and it looks like the terms lookup filter is not
invalidated. Here are some steps to reproduce:

First, seed your data:

PUT /user/user/1
{
"roles": ["admin"]
}

POST /foo/bar
{
"a": 1,
"role": "admin"
}

This query will return 1 hit, as expected:

POST /foo/bar/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"role": {
"index": "user",
"type": "user",
"id": "1",
"path": "roles"
},
"_cache_key": "user_1_roles"
}
}
}
}
}

Then, update user 1's roles:

PUT /user/user/1
{
"roles": []
}

If you run the search again, you'd expect no results, but instead you get
1 hit. If you clear the cache:

POST /foo/_cache/clear?filter_keys=user_1_roles

and then run the search again, you get no results.

On Tuesday, May 13, 2014 4:08:47 PM UTC-4, Nikolas Everett wrote:

You'd have to post an example for me to be sure what's up but I don't
expect terms filters to be different from other filters.

On Tue, May 13, 2014 at 3:45 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com> wrote:

Very cool, but what about terms lookup? The docs mention it uses an LRU
cache, is that separate from the segment? I ask because in my naive little
tests it seems that I'm getting stale results from my terms lookup filters,
even with indices.cache.filter.terms.expire_after_write set.

On Tuesday, May 13, 2014 2:35:26 PM UTC-4, Nikolas Everett wrote:

On Tue, May 13, 2014 at 2:24 PM, Gabe Gorelick-Feldman <
gabego...@gmail.com> wrote:

Does elasticsearch invalidate your filter caches automatically after
a write? The only documentation I've found on filter cache invalidation
(besides time-based expiry) is on indices.cache.filter.terms.expire_after_write,
which is disabled by default. But what about other filters besides terms
lookup? Should an application that's concerned about consistency disable
filter caching altogether? Or do you have to explicitly clear every
applicable filter cache after a write?

Adding data "just works". I believe filter caches are per segment.
The way Lucene (therefor Elasticsearch) update
documents is by marking them deleted in their old segment and readding them
in a new segment. So the filter cache for the old segment can stay because
it'll just hit a tombstone for the document and skip it and the new segment
will be cached on the next query that includes the
filter.

Nik

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0fc4e1a2-6d1e-4b8c-b017-37a503b53e82%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e0c35840-0cfc-47db-8a86-6913ee38cc60%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e0c35840-0cfc-47db-8a86-6913ee38cc60%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc446d94-8bb4-46fa-9a71-3b36f5ad0499%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #9