Having strange problems with "term lookup" filter

I am really confused, and keep getting erratic results when I use the term
lookup filter. I am hoping someone could give me a model for reasoning
about it, as I've been unable to find anything in the documentation that
explains the behavior.

I run a 5 node cluster, with a 50 shard index, replicated 2x.

Notable elasticsearch.yml settings are:

indices.cache.filter.terms.expire_after_write: 1s
indices.cache.filter.terms.expire_after_access: 30s
index.number_of_shards: 50
index.number_of_replicas: 1

I then have a document:

POST: luna_2/Domain/4?routing=15793206719&refresh=true

{"public_index_ids":[370846021070]}

The following query returns results:

POST: luna_2/Task/_search?routing=15793206719
{
"filter": {
"terms": {
"_cache_key": "foo",
"_cache": false,
"index_ids": [
370846021070
]
}
}
}

While this:

{
"filter": {
"terms": {
"_cache_key": "bar",
"_cache": false,
"index_ids": {
"index": "luna_2",
"type": "Domain",
"id": "4",
"path": "public_index_ids"
}
}
}
}

Behaves erratically. When I first built the index, it returned the correct
results.

I then modified luna_2/Domain/4 to have an empty public_index_ids array,
and re-issued the query.

It continued to return the same results (even though it should have started
returning none). I changed the cache_key (even though I asked for the
filter to not be cached) and it returned an empty result set.

I then changed luna_2/Domain/4 to the original value
{"public_index_ids":[370846021070]}

And now, the query is still returning an empty result set, even with a new
cache key.

I can't figure out what could be happening. Are there levels of caching
that I am missing? It seems like the cache for the Domain/4 document should
expire after 30 seconds, yet something is getting kept around.

Are there any tools to inspect the cache, or issue a query that doesn't use
the cache?

Any help would be much appreciated

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

can you create a gist, in order to reproduce the issue?
Also, this might help (to explain at least some parts of that erratic
behaviour): https://github.com/elasticsearch/elasticsearch/issues/3219

--Alex

On Mon, Jun 24, 2013 at 8:16 PM, daveey daveey@gmail.com wrote:

I am really confused, and keep getting erratic results when I use the term
lookup filter. I am hoping someone could give me a model for reasoning
about it, as I've been unable to find anything in the documentation that
explains the behavior.

I run a 5 node cluster, with a 50 shard index, replicated 2x.

Notable elasticsearch.yml settings are:

indices.cache.filter.terms.expire_after_write: 1s
indices.cache.filter.terms.expire_after_access: 30s
index.number_of_shards: 50
index.number_of_replicas: 1

I then have a document:

POST: luna_2/Domain/4?routing=15793206719&refresh=true

{"public_index_ids":[370846021070]}

The following query returns results:

POST: luna_2/Task/_search?routing=15793206719
{
"filter": {
"terms": {
"_cache_key": "foo",
"_cache": false,
"index_ids": [
370846021070
]
}
}
}

While this:

{
"filter": {
"terms": {
"_cache_key": "bar",
"_cache": false,
"index_ids": {
"index": "luna_2",
"type": "Domain",
"id": "4",
"path": "public_index_ids"
}
}
}
}

Behaves erratically. When I first built the index, it returned the correct
results.

I then modified luna_2/Domain/4 to have an empty public_index_ids array,
and re-issued the query.

It continued to return the same results (even though it should have
started returning none). I changed the cache_key (even though I asked for
the filter to not be cached) and it returned an empty result set.

I then changed luna_2/Domain/4 to the original value
{"public_index_ids":[370846021070]}

And now, the query is still returning an empty result set, even with a new
cache key.

I can't figure out what could be happening. Are there levels of caching
that I am missing? It seems like the cache for the Domain/4 document should
expire after 30 seconds, yet something is getting kept around.

Are there any tools to inspect the cache, or issue a query that doesn't
use the cache?

Any help would be much appreciated

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, this issue does explain a ton! I was assuming that _cache: false would
work, or that changing the looked up document would automatically
invalidate the cached filter. I will set things up to manually invalidate
the cache_key, and see how that goes.

Thanks very much!

On Monday, June 24, 2013 1:32:58 PM UTC-7, Alexander Reelsen wrote:

Hey,

can you create a gist, in order to reproduce the issue?
Also, this might help (to explain at least some parts of that erratic
behaviour): https://github.com/elasticsearch/elasticsearch/issues/3219

--Alex

On Mon, Jun 24, 2013 at 8:16 PM, daveey <dav...@gmail.com <javascript:>>wrote:

I am really confused, and keep getting erratic results when I use the
term lookup filter. I am hoping someone could give me a model for reasoning
about it, as I've been unable to find anything in the documentation that
explains the behavior.

I run a 5 node cluster, with a 50 shard index, replicated 2x.

Notable elasticsearch.yml settings are:

indices.cache.filter.terms.expire_after_write: 1s
indices.cache.filter.terms.expire_after_access: 30s
index.number_of_shards: 50
index.number_of_replicas: 1

I then have a document:

POST: luna_2/Domain/4?routing=15793206719&refresh=true

{"public_index_ids":[370846021070]}

The following query returns results:

POST: luna_2/Task/_search?routing=15793206719
{
"filter": {
"terms": {
"_cache_key": "foo",
"_cache": false,
"index_ids": [
370846021070
]
}
}
}

While this:

{
"filter": {
"terms": {
"_cache_key": "bar",
"_cache": false,
"index_ids": {
"index": "luna_2",
"type": "Domain",
"id": "4",
"path": "public_index_ids"
}
}
}
}

Behaves erratically. When I first built the index, it returned the
correct results.

I then modified luna_2/Domain/4 to have an empty public_index_ids array,
and re-issued the query.

It continued to return the same results (even though it should have
started returning none). I changed the cache_key (even though I asked for
the filter to not be cached) and it returned an empty result set.

I then changed luna_2/Domain/4 to the original value
{"public_index_ids":[370846021070]}

And now, the query is still returning an empty result set, even with a
new cache key.

I can't figure out what could be happening. Are there levels of caching
that I am missing? It seems like the cache for the Domain/4 document should
expire after 30 seconds, yet something is getting kept around.

Are there any tools to inspect the cache, or issue a query that doesn't
use the cache?

Any help would be much appreciated

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ok, I can now repro one of the problems, here is the
gist: https://gist.github.com/anonymous/5854123

I think the issue might be with the lookup ignoring the "routing"
parameter. If I create the domain without a routing, things work fine.

On Monday, June 24, 2013 2:19:08 PM UTC-7, daveey wrote:

Yes, this issue does explain a ton! I was assuming that _cache: false
would work, or that changing the looked up document would automatically
invalidate the cached filter. I will set things up to manually invalidate
the cache_key, and see how that goes.

Thanks very much!

On Monday, June 24, 2013 1:32:58 PM UTC-7, Alexander Reelsen wrote:

Hey,

can you create a gist, in order to reproduce the issue?
Also, this might help (to explain at least some parts of that erratic
behaviour): https://github.com/elasticsearch/elasticsearch/issues/3219

--Alex

On Mon, Jun 24, 2013 at 8:16 PM, daveey dav...@gmail.com wrote:

I am really confused, and keep getting erratic results when I use the
term lookup filter. I am hoping someone could give me a model for reasoning
about it, as I've been unable to find anything in the documentation that
explains the behavior.

I run a 5 node cluster, with a 50 shard index, replicated 2x.

Notable elasticsearch.yml settings are:

indices.cache.filter.terms.expire_after_write: 1s
indices.cache.filter.terms.expire_after_access: 30s
index.number_of_shards: 50
index.number_of_replicas: 1

I then have a document:

POST: luna_2/Domain/4?routing=15793206719&refresh=true

{"public_index_ids":[370846021070]}

The following query returns results:

POST: luna_2/Task/_search?routing=15793206719
{
"filter": {
"terms": {
"_cache_key": "foo",
"_cache": false,
"index_ids": [
370846021070
]
}
}
}

While this:

{
"filter": {
"terms": {
"_cache_key": "bar",
"_cache": false,
"index_ids": {
"index": "luna_2",
"type": "Domain",
"id": "4",
"path": "public_index_ids"
}
}
}
}

Behaves erratically. When I first built the index, it returned the
correct results.

I then modified luna_2/Domain/4 to have an empty public_index_ids array,
and re-issued the query.

It continued to return the same results (even though it should have
started returning none). I changed the cache_key (even though I asked for
the filter to not be cached) and it returned an empty result set.

I then changed luna_2/Domain/4 to the original value
{"public_index_ids":[370846021070]}

And now, the query is still returning an empty result set, even with a
new cache key.

I can't figure out what could be happening. Are there levels of caching
that I am missing? It seems like the cache for the Domain/4 document should
expire after 30 seconds, yet something is getting kept around.

Are there any tools to inspect the cache, or issue a query that doesn't
use the cache?

Any help would be much appreciated

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.