Facets with all_terms return deleted terms

Hi,

I'm building a search/filter interface based on elasticsearch 0.20.X.

I've got books with following schema :

  • title
  • author
  • publisher

I sometimes need to remove all books of given publisher for various
reasons.
Another business requirement is that all available publishers should be
listed under the publisher filter, even when other filter criteria excludes
all books.

I use all_terms facets and everything went fine until I removed a
publisher : it is still listed in the facet result.

Is there any way to reset facet list ?

Julien.

Some sample data:

{author: "Terry Pratchet", title: "Discworld 1", publisher: "Harper Collins"
}
{author: "Terry Pratchet", title: "Discworld 2", publisher: "Harper Collins"
}
{author: "Terry Pratchet", title: "Discworld 3", publisher: "Harper Collins"
}
{author: "Terry Pratchet", title: "Discworld 4", publisher: "Harper Collins"
}
{author: "Douglas Adams", title: "H2G2", publisher: "Geoffrey Perkins"}

The following request list all publisher, event the deleted ones.

$ curl -XPOST 'http://localhost:9200/books/_search' -d '{
"from":0,
"size":50,
"query": { "match_all":{} },
"facets": {
"publisher": {
"terms": {
"field": "publisher",
"size": 70,
"all_terms": true,
"order": "term"
}
}
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

How do you delete a publisher? What does the command look like?

Do you optimize the index after deletion?

Jörg

Am 21.03.13 16:43, schrieb the.mouette@gmail.com:

Hi,

I'm building a search/filter interface based on /elasticsearch 0.20.X/.

I've got books with following schema :

  • title
  • author
  • publisher

I sometimes need to remove all books of given publisher for
various reasons.
Another business requirement is that all available publishers should
be listed under the publisher filter, even when other filter criteria
excludes all books.

I use all_terms facets and everything went fine until I removed a
publisher : it is still listed in the facet result.

Is there any way to reset facet list ?

Julien.

Some sample data:

|
{author:"Terry Pratchet",title:"Discworld 1",publisher:"Harper Collins"}
{author:"Terry Pratchet",title:"Discworld 2",publisher:"Harper Collins"}
{author:"Terry Pratchet",title:"Discworld 3",publisher:"Harper Collins"}
{author:"Terry Pratchet",title:"Discworld 4",publisher:"Harper Collins"}
{author:"Douglas Adams",title:"H2G2",publisher:"Geoffrey Perkins"}
|

The following request list all publisher, event the deleted ones.

|
$ curl -XPOST 'http://localhost:9200/books/_search'-d '{
"from":0,
"size":50,
"query": { "match_all":{} },
"facets": {
"publisher": {
"terms": {
"field": "publisher",
"size": 70,
"all_terms": true,
"order": "term"
}
}
}'

|

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I use all_terms facets and everything went fine until I removed a
publisher : it is still listed in the facet result.

The terms list includes terms in documents that have been deleted.
Deleting a document just marks it as deleted and excludes it from future
searches.

Only once the segment that the document was in is rewritten will the doc
(and the terms) be truly expunged.

That happens automatically in the background as segments are merged, or
you can force it with the optimize API (which can be quite heavy)

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Wahoo, that was fast !

I'm using Elastica for deletion, I'll dig a bit to find out the exact query.

Thx

Le jeudi 21 mars 2013 16:47:36 UTC+1, Jörg Prante a écrit :

How do you delete a publisher? What does the command look like?

Do you optimize the index after deletion?

J?rg

Am 21.03.13 16:43, schrieb the.m...@gmail.com <javascript:>:

Hi,

I'm building a search/filter interface based on /elasticsearch 0.20.X/.

I've got books with following schema :

  • title
  • author
  • publisher

I sometimes need to remove all books of given publisher for
various reasons.
Another business requirement is that all available publishers should
be listed under the publisher filter, even when other filter criteria
excludes all books.

I use all_terms facets and everything went fine until I removed a
publisher : it is still listed in the facet result.

Is there any way to reset facet list ?

Julien.

Some sample data:

|
{author:"Terry Pratchet",title:"Discworld 1",publisher:"Harper Collins"}
{author:"Terry Pratchet",title:"Discworld 2",publisher:"Harper Collins"}
{author:"Terry Pratchet",title:"Discworld 3",publisher:"Harper Collins"}
{author:"Terry Pratchet",title:"Discworld 4",publisher:"Harper Collins"}
{author:"Douglas Adams",title:"H2G2",publisher:"Geoffrey Perkins"}
|

The following request list all publisher, event the deleted ones.

|
$ curl -XPOST 'http://localhost:9200/books/_search'-d '{
"from":0,
"size":50,
"query": { "match_all":{} },
"facets": {
"publisher": {
"terms": {
"field": "publisher",
"size": 70,
"all_terms": true,
"order": "term"
}
}
}'

|

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Clinton,

Thanks for your answer, I undersand better how it works internally.

I played with optimize, refresh and cache api, and didn't achieve to clean
facet queries.

curl -XPOST 'http://localhost:9200/_all/_optimize'
curl -XPOST 'http://localhost:9200/_all/_optimize' -d
'{max_num_segments:1,only_expunge_deletes:true,wait_for_merge:true}'
curl -XPOST 'http://localhost:9200/eliteauto/_cache/clear'
curl -XPOST 'http://localhost:9200/_refresh'

all returns expected

{"ok":true,"_shards":{"total":1,"successful":1,"failed":0}}

Am I doing something wrong ?

Le jeudi 21 mars 2013 17:05:45 UTC+1, Clinton Gormley a écrit :

I use all_terms facets and everything went fine until I removed a
publisher : it is still listed in the facet result.

The terms list includes terms in documents that have been deleted.
Deleting a document just marks it as deleted and excludes it from future
searches.

Only once the segment that the document was in is rewritten will the doc
(and the terms) be truly expunged.

That happens automatically in the background as segments are merged, or
you can force it with the optimize API (which can be quite heavy)

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

Thanks for your answer, I undersand better how it works internally.

Actually, I appear to have misunderstood how terms facets are
calculated. It appears that deleted documents are taken into account.

See: https://gist.github.com/clintongormley/5220985

This at least works on 0.90.0.RC1. I haven't tested it on 0.20*

Which leads me to believe that you still have documents with that
publisher.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

looks good to me

On Friday, March 22, 2013 1:15:22 PM UTC+1, the.m...@gmail.com wrote:

Hi Clinton,

Thanks for your answer, I undersand better how it works internally.

I played with optimize, refresh and cache api, and didn't achieve to clean
facet queries.

curl -XPOST 'http://localhost:9200/_all/_optimize'
curl -XPOST 'http://localhost:9200/_all/_optimize' -d
'{max_num_segments:1,only_expunge_deletes:true,wait_for_merge:true}'
curl -XPOST 'http://localhost:9200/eliteauto/_cache/clear'
curl -XPOST 'http://localhost:9200/_refresh'

all returns expected

{"ok":true,"_shards":{"total":1,"successful":1,"failed":0}}

Am I doing something wrong ?

Le jeudi 21 mars 2013 17:05:45 UTC+1, Clinton Gormley a écrit :

I use all_terms facets and everything went fine until I removed a
publisher : it is still listed in the facet result.

The terms list includes terms in documents that have been deleted.
Deleting a document just marks it as deleted and excludes it from future
searches.

Only once the segment that the document was in is rewritten will the doc
(and the terms) be truly expunged.

That happens automatically in the background as segments are merged, or
you can force it with the optimize API (which can be quite heavy)

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Indeed !

I thought it might be because of child documents or the custom analyzer,
but it isn't either.
I've got no clue about what's going on.

Just updated to 0.90.0-RC1, and rebuilt the index and it still doesn't work
as expected with the real index, whereas it does with the updated gist
: https://gist.github.com/themouette/5221705

The only differences are the names, the number of fields (113) and the
number of documents (15k for main doc and 1.5M for child)

thanks for your answers everybody.

Le vendredi 22 mars 2013 13:48:00 UTC+1, Clinton Gormley a écrit :

Hiya

Thanks for your answer, I undersand better how it works internally.

Actually, I appear to have misunderstood how terms facets are
calculated. It appears that deleted documents are taken into account.

See: https://gist.github.com/clintongormley/5220985

This at least works on 0.90.0.RC1. I haven't tested it on 0.20*

Which leads me to believe that you still have documents with that
publisher.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.