Terms facet all_terms does not work


(Mustafa Sener) #1

Hi,
I want to get all terms of a field. For this purpose I used all_terms
property in terms facet. However, it does not work. In my test I had 1000
distinct terms for a field. However, when I executed terms facet with
"all_terms":true parameter, it just returns first 10 terms. Is all_terms
deprecated?

--
Mustafa Sener
www.ifountain.com


(Shay Banon) #2

all_terms is a bad name..., it basically means that you will get back terms
with 0 count as well. There is no option to get back all terms back, open
an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener mustafa.sener@gmail.comwrote:

Hi,
I want to get all terms of a field. For this purpose I used all_terms
property in terms facet. However, it does not work. In my test I had 1000
distinct terms for a field. However, when I executed terms facet with
"all_terms":true parameter, it just returns first 10 terms. Is all_terms
deprecated?

--
Mustafa Sener
www.ifountain.com


(project2501) #3

Hi,
I was going to ask about this as well. I use the termscomponent in
Solr and am migrating to ES.
The Solr termscomponent will return all the terms in the index (for a
field) with frequency counts on them.

From the above reply, it seems ES does not have a similar feature so I
vote to open an issue to expose
this Lucene capability in similar fashion.

On Dec 7, 4:42 pm, Shay Banon kim...@gmail.com wrote:

all_terms is a bad name..., it basically means that you will get back terms
with 0 count as well. There is no option to get back all terms back, open
an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener mustafa.se...@gmail.comwrote:

Hi,
I want to get all terms of a field. For this purpose I used all_terms
property in terms facet. However, it does not work. In my test I had 1000
distinct terms for a field. However, when I executed terms facet with
"all_terms":true parameter, it just returns first 10 terms. Is all_terms
deprecated?

--
Mustafa Sener
www.ifountain.com


(Loco Jay) #4

+1 for issue (using 2 calls at moment)
+1 for field cache ability to got to disk and use less mem

On Dec 8, 2011, at 10:37 AM, project2501 wrote:

Hi,
I was going to ask about this as well. I use the termscomponent in
Solr and am migrating to ES.
The Solr termscomponent will return all the terms in the index (for a
field) with frequency counts on them.

From the above reply, it seems ES does not have a similar feature so I
vote to open an issue to expose
this Lucene capability in similar fashion.

On Dec 7, 4:42 pm, Shay Banon kim...@gmail.com wrote:

all_terms is a bad name..., it basically means that you will get back terms
with 0 count as well. There is no option to get back all terms back, open
an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener mustafa.se...@gmail.comwrote:

Hi,
I want to get all terms of a field. For this purpose I used all_terms
property in terms facet. However, it does not work. In my test I had 1000
distinct terms for a field. However, when I executed terms facet with
"all_terms":true parameter, it just returns first 10 terms. Is all_terms
deprecated?

--
Mustafa Sener
www.ifountain.com


(Ivan Brusic) #5

Let me jump aboard and say that I am also looking for similar capabilities.

Currently evaluating the feasibility of converting a modified Lucene
project to ElasticSearch, and the functionality that I am not able to
replicate is the use of TermDocs. ElasticSearch has its own version of
FieldCache and I am currently looking at what precisely does it
contain and if I can be exposed. Having the ability to retrieve all
terms for a field will eliminate the need to access the FieldCache,
especially since I would need to execute some warmup queries in order
to populate it.

Ivan

On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay locojaydev@gmail.com wrote:

+1 for issue (using 2 calls at moment)
+1 for field cache ability to got to disk and use less mem

On Dec 8, 2011, at 10:37 AM, project2501 wrote:

Hi,
I was going to ask about this as well. I use the termscomponent in
Solr and am migrating to ES.
The Solr termscomponent will return all the terms in the index (for a
field) with frequency counts on them.

From the above reply, it seems ES does not have a similar feature so I
vote to open an issue to expose
this Lucene capability in similar fashion.

On Dec 7, 4:42 pm, Shay Banon kim...@gmail.com wrote:

all_terms is a bad name..., it basically means that you will get back terms
with 0 count as well. There is no option to get back all terms back, open
an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener mustafa.se...@gmail.comwrote:

Hi,
I want to get all terms of a field. For this purpose I used all_terms
property in terms facet. However, it does not work. In my test I had 1000
distinct terms for a field. However, when I executed terms facet with
"all_terms":true parameter, it just returns first 10 terms. Is all_terms
deprecated?

--
Mustafa Sener
www.ifountain.com


(Mustafa Sener) #6

Hi,
I created an issue

On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic ivan@brusic.com wrote:

Let me jump aboard and say that I am also looking for similar capabilities.

Currently evaluating the feasibility of converting a modified Lucene
project to ElasticSearch, and the functionality that I am not able to
replicate is the use of TermDocs. ElasticSearch has its own version of
FieldCache and I am currently looking at what precisely does it
contain and if I can be exposed. Having the ability to retrieve all
terms for a field will eliminate the need to access the FieldCache,
especially since I would need to execute some warmup queries in order
to populate it.

Ivan

On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay locojaydev@gmail.com wrote:

+1 for issue (using 2 calls at moment)
+1 for field cache ability to got to disk and use less mem

On Dec 8, 2011, at 10:37 AM, project2501 wrote:

Hi,
I was going to ask about this as well. I use the termscomponent in
Solr and am migrating to ES.
The Solr termscomponent will return all the terms in the index (for a
field) with frequency counts on them.

From the above reply, it seems ES does not have a similar feature so I
vote to open an issue to expose
this Lucene capability in similar fashion.

On Dec 7, 4:42 pm, Shay Banon kim...@gmail.com wrote:

all_terms is a bad name..., it basically means that you will get back
terms

with 0 count as well. There is no option to get back all terms back,
open

an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener <
mustafa.se...@gmail.com>wrote:

Hi,
I want to get all terms of a field. For this purpose I used all_terms
property in terms facet. However, it does not work. In my test I had
1000

distinct terms for a field. However, when I executed terms facet with
"all_terms":true parameter, it just returns first 10 terms. Is
all_terms

deprecated?

--
Mustafa Sener
www.ifountain.com

--
Mustafa Sener
www.ifountain.com
WebRep
Overall rating


(Shay Banon) #7

What exactly are you looking for when working with TermDocs?

On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic ivan@brusic.com wrote:

Let me jump aboard and say that I am also looking for similar capabilities.

Currently evaluating the feasibility of converting a modified Lucene
project to ElasticSearch, and the functionality that I am not able to
replicate is the use of TermDocs. ElasticSearch has its own version of
FieldCache and I am currently looking at what precisely does it
contain and if I can be exposed. Having the ability to retrieve all
terms for a field will eliminate the need to access the FieldCache,
especially since I would need to execute some warmup queries in order
to populate it.

Ivan

On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay locojaydev@gmail.com wrote:

+1 for issue (using 2 calls at moment)
+1 for field cache ability to got to disk and use less mem

On Dec 8, 2011, at 10:37 AM, project2501 wrote:

Hi,
I was going to ask about this as well. I use the termscomponent in
Solr and am migrating to ES.
The Solr termscomponent will return all the terms in the index (for a
field) with frequency counts on them.

From the above reply, it seems ES does not have a similar feature so I
vote to open an issue to expose
this Lucene capability in similar fashion.

On Dec 7, 4:42 pm, Shay Banon kim...@gmail.com wrote:

all_terms is a bad name..., it basically means that you will get back
terms

with 0 count as well. There is no option to get back all terms back,
open

an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener <
mustafa.se...@gmail.com>wrote:

Hi,
I want to get all terms of a field. For this purpose I used all_terms
property in terms facet. However, it does not work. In my test I had
1000

distinct terms for a field. However, when I executed terms facet with
"all_terms":true parameter, it just returns first 10 terms. Is
all_terms

deprecated?

--
Mustafa Sener
www.ifountain.com


(Ivan Brusic) #8

The use of TermDocs is to simply get all values for a specific term.
The existing Lucene infrastructure uses this information at startup to
pre-calculate/cache various properties of the system and to provide
some type of faceting. Not looking to get TermDoc access in
ElasticSearch, but a method to get all terms would be nice.

On Fri, Dec 9, 2011 at 2:34 PM, Shay Banon kimchy@gmail.com wrote:

What exactly are you looking for when working with TermDocs?

On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic ivan@brusic.com wrote:

Let me jump aboard and say that I am also looking for similar
capabilities.

Currently evaluating the feasibility of converting a modified Lucene
project to ElasticSearch, and the functionality that I am not able to
replicate is the use of TermDocs. ElasticSearch has its own version of
FieldCache and I am currently looking at what precisely does it
contain and if I can be exposed. Having the ability to retrieve all
terms for a field will eliminate the need to access the FieldCache,
especially since I would need to execute some warmup queries in order
to populate it.

Ivan

On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay locojaydev@gmail.com wrote:

+1 for issue (using 2 calls at moment)
+1 for field cache ability to got to disk and use less mem

On Dec 8, 2011, at 10:37 AM, project2501 wrote:

Hi,
I was going to ask about this as well. I use the termscomponent in
Solr and am migrating to ES.
The Solr termscomponent will return all the terms in the index (for a
field) with frequency counts on them.

From the above reply, it seems ES does not have a similar feature so I
vote to open an issue to expose
this Lucene capability in similar fashion.

On Dec 7, 4:42 pm, Shay Banon kim...@gmail.com wrote:

all_terms is a bad name..., it basically means that you will get back
terms
with 0 count as well. There is no option to get back all terms back,
open
an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener
mustafa.se...@gmail.comwrote:

Hi,
I want to get all terms of a field. For this purpose I used all_terms
property in terms facet. However, it does not work. In my test I had
1000
distinct terms for a field. However, when I executed terms facet with
"all_terms":true parameter, it just returns first 10 terms. Is
all_terms
deprecated?

--
Mustafa Sener
www.ifountain.com


(Shay Banon) #9

There used to be a terms API in elasticsearch that returned all terms for a
field, but it was not properly implemented (i.e. paginating through it
while still having a consistent view, similar to scrolling), so it was
removed. We can try and implement it again properly...

On Mon, Dec 12, 2011 at 5:13 AM, Ivan Brusic ivan@brusic.com wrote:

The use of TermDocs is to simply get all values for a specific term.
The existing Lucene infrastructure uses this information at startup to
pre-calculate/cache various properties of the system and to provide
some type of faceting. Not looking to get TermDoc access in
ElasticSearch, but a method to get all terms would be nice.

On Fri, Dec 9, 2011 at 2:34 PM, Shay Banon kimchy@gmail.com wrote:

What exactly are you looking for when working with TermDocs?

On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic ivan@brusic.com wrote:

Let me jump aboard and say that I am also looking for similar
capabilities.

Currently evaluating the feasibility of converting a modified Lucene
project to ElasticSearch, and the functionality that I am not able to
replicate is the use of TermDocs. ElasticSearch has its own version of
FieldCache and I am currently looking at what precisely does it
contain and if I can be exposed. Having the ability to retrieve all
terms for a field will eliminate the need to access the FieldCache,
especially since I would need to execute some warmup queries in order
to populate it.

Ivan

On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay locojaydev@gmail.com wrote:

+1 for issue (using 2 calls at moment)
+1 for field cache ability to got to disk and use less mem

On Dec 8, 2011, at 10:37 AM, project2501 wrote:

Hi,
I was going to ask about this as well. I use the termscomponent in
Solr and am migrating to ES.
The Solr termscomponent will return all the terms in the index (for a
field) with frequency counts on them.

From the above reply, it seems ES does not have a similar feature so
I

vote to open an issue to expose
this Lucene capability in similar fashion.

On Dec 7, 4:42 pm, Shay Banon kim...@gmail.com wrote:

all_terms is a bad name..., it basically means that you will get
back

terms
with 0 count as well. There is no option to get back all terms back,
open
an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener
mustafa.se...@gmail.comwrote:

Hi,
I want to get all terms of a field. For this purpose I used
all_terms

property in terms facet. However, it does not work. In my test I
had

1000
distinct terms for a field. However, when I executed terms facet
with

"all_terms":true parameter, it just returns first 10 terms. Is
all_terms
deprecated?

--
Mustafa Sener
www.ifountain.com


(project2501) #10

That would be great! Thanks.

Also, I'm not sure if Lucene supports this, but it would be cool if
the terms feature
can work for search results too, thereby limiting the term vectors to
only those contained in document results.
The current behavior I see (from using Solr's terms component) is that
you can only get
terms from the whole index.

On Dec 12, 7:58 am, Shay Banon kim...@gmail.com wrote:

There used to be a terms API in elasticsearch that returned all terms for a
field, but it was not properly implemented (i.e. paginating through it
while still having a consistent view, similar to scrolling), so it was
removed. We can try and implement it again properly...

On Mon, Dec 12, 2011 at 5:13 AM, Ivan Brusic i...@brusic.com wrote:

The use of TermDocs is to simply get all values for a specific term.
The existing Lucene infrastructure uses this information at startup to
pre-calculate/cache various properties of the system and to provide
some type of faceting. Not looking to get TermDoc access in
ElasticSearch, but a method to get all terms would be nice.

On Fri, Dec 9, 2011 at 2:34 PM, Shay Banon kim...@gmail.com wrote:

What exactly are you looking for when working with TermDocs?

On Thu, Dec 8, 2011 at 9:57 PM, Ivan Brusic i...@brusic.com wrote:

Let me jump aboard and say that I am also looking for similar
capabilities.

Currently evaluating the feasibility of converting a modified Lucene
project to ElasticSearch, and the functionality that I am not able to
replicate is the use of TermDocs. ElasticSearch has its own version of
FieldCache and I am currently looking at what precisely does it
contain and if I can be exposed. Having the ability to retrieve all
terms for a field will eliminate the need to access the FieldCache,
especially since I would need to execute some warmup queries in order
to populate it.

Ivan

On Thu, Dec 8, 2011 at 7:41 AM, Loco Jay locojay...@gmail.com wrote:

+1 for issue (using 2 calls at moment)
+1 for field cache ability to got to disk and use less mem

On Dec 8, 2011, at 10:37 AM, project2501 wrote:

Hi,
I was going to ask about this as well. I use the termscomponent in
Solr and am migrating to ES.
The Solr termscomponent will return all the terms in the index (for a
field) with frequency counts on them.

From the above reply, it seems ES does not have a similar feature so
I

vote to open an issue to expose
this Lucene capability in similar fashion.

On Dec 7, 4:42 pm, Shay Banon kim...@gmail.com wrote:

all_terms is a bad name..., it basically means that you will get
back

terms
with 0 count as well. There is no option to get back all terms back,
open
an issue?

On Wed, Dec 7, 2011 at 11:35 PM, Mustafa Sener
mustafa.se...@gmail.comwrote:

Hi,
I want to get all terms of a field. For this purpose I used
all_terms

property in terms facet. However, it does not work. In my test I
had

1000
distinct terms for a field. However, when I executed terms facet
with

"all_terms":true parameter, it just returns first 10 terms. Is
all_terms
deprecated?

--
Mustafa Sener
www.ifountain.com


(system) #11