I have an "author" field, which I want to be searchable using the
default analyzer (~tokens). But I also want to be able to return facet
counts for this field unanalyzed, as a whole string, no tokens.
It seems to work, but is this the right way to do it? Is there a
better way to make a field token-searchable, but also return facets
for the whole string at the same time?
For question 1: I don't see any problem with that solution. In my opinion, 'multi_field' fits perfectly with your requirement.
For question 2: yes it's related to question 1 ;).
In the term facet, you asked for 'author.untouched' version of 'author' field. So, no analyzer will be applied, and you get facet values on whole field.
In the query_string, you queried on 'author' field, which is 'author.author' version in your multi_field declaration. The standard analyzer will be applied in this case.
So, the query input "W. Ellis Penning" will be broken into tokens: "W", "Ellis", "Penning" . So, any author whose name matches ANY of these token will be consider matched.
Hope this helps.
Regards,
LTVP
On May 8, 2012, at 5:40 PM, Crwe wrote:
Hello all,
I am new to ES and have two questions:
I have an "author" field, which I want to be searchable using the
default analyzer (~tokens). But I also want to be able to return facet
counts for this field unanalyzed, as a whole string, no tokens.
It seems to work, but is this the right way to do it? Is there a
better way to make a field token-searchable, but also return facets
for the whole string at the same time?
In the term facet, you asked for 'author.untouched' version of 'author' field. So, no analyzer will be applied, and you get facet values on whole field.
In the query_string, you queried on 'author' field, which is 'author.author' version in your multi_field declaration. The standard analyzer will be applied in this case.
So, the query input "W. Ellis Penning" will be broken into tokens: "W", "Ellis", "Penning" . So, any author whose name matches ANY of these token will be consider matched.
I don't think that's it. Firstly, the query is in double quotes "",
which I understand should search for the entire phrase.
Secondly, all four returned documents indeed contain "W. Ellis
Penning" verbatim, inside the author list. Why two of them were not
counted in the facet remains a mystery to me.
Hope this helps.
Regards,
LTVP
On May 8, 2012, at 5:40 PM, Crwe wrote:
Hello all,
I am new to ES and have two questions:
I have an "author" field, which I want to be searchable using the
default analyzer (~tokens). But I also want to be able to return facet
counts for this field unanalyzed, as a whole string, no tokens.
It seems to work, but is this the right way to do it? Is there a
better way to make a field token-searchable, but also return facets
for the whole string at the same time?
In the term facet, you asked for 'author.untouched' version of 'author' field. So, no analyzer will be applied, and you get facet values on whole field.
In the query_string, you queried on 'author' field, which is 'author.author' version in your multi_field declaration. The standard analyzer will be applied in this case.
So, the query input "W. Ellis Penning" will be broken into tokens: "W", "Ellis", "Penning" . So, any author whose name matches ANY of these token will be consider matched.
I don't think that's it. Firstly, the query is in double quotes "",
which I understand should search for the entire phrase.
Secondly, all four returned documents indeed contain "W. Ellis
Penning" verbatim, inside the author list. Why two of them were not
counted in the facet remains a mystery to me.
Hope this helps.
Regards,
LTVP
On May 8, 2012, at 5:40 PM, Crwe wrote:
Hello all,
I am new to ES and have two questions:
I have an "author" field, which I want to be searchable using the
default analyzer (~tokens). But I also want to be able to return facet
counts for this field unanalyzed, as a whole string, no tokens.
It seems to work, but is this the right way to do it? Is there a
better way to make a field token-searchable, but also return facets
for the whole string at the same time?
I've been trying to create a minimum failing example for the past two
hours, but failed.
When I enter the same four documents (and nothing else) into a new
testing index, the facet count matches the search hit count, like I
would expect.
I also tried deleting the index and re-indexing everything, using the
same script as before. Now the problem with "W. Ellis Penning" is gone
(facet returns count=4 as it should), but Mr. "Alexander Pasko"
returns facet count 4 while a direct query count=6.
So the issue seems to pop up randomly, with different records.
LTVP, I'll try sending you the dataset and the script privately
(unfortunately it's not public). Could you please try replicating
this? It's driving me crazy.
Apart from the dataset, let's keep the discussion here, in public.
Would be great if you can gist a recreation of your problem, as described athttp://www.elasticsearch.org/help/
I'm interested to investigate it further
LTVP
On May 8, 2012, at 6:10 PM, Crwe wrote:
Hello LTVP,
thanks for the quick reply!
In the term facet, you asked for 'author.untouched' version of 'author' field. So, no analyzer will be applied, and you get facet values on whole field.
In the query_string, you queried on 'author' field, which is 'author.author' version in your multi_field declaration. The standard analyzer will be applied in this case.
So, the query input "W. Ellis Penning" will be broken into tokens: "W", "Ellis", "Penning" . So, any author whose name matches ANY of these token will be consider matched.
I don't think that's it. Firstly, the query is in double quotes "",
which I understand should search for the entire phrase.
Secondly, all four returned documents indeed contain "W. Ellis
Penning" verbatim, inside the author list. Why two of them were not
counted in the facet remains a mystery to me.
Hope this helps.
Regards,
LTVP
On May 8, 2012, at 5:40 PM, Crwe wrote:
Hello all,
I am new to ES and have two questions:
I have an "author" field, which I want to be searchable using the
default analyzer (~tokens). But I also want to be able to return facet
counts for this field unanalyzed, as a whole string, no tokens.
It seems to work, but is this the right way to do it? Is there a
better way to make a field token-searchable, but also return facets
for the whole string at the same time?
I've been trying to create a minimum failing example for the past two
hours, but failed.
When I enter the same four documents (and nothing else) into a new
testing index, the facet count matches the search hit count, like I
would expect.
I also tried deleting the index and re-indexing everything, using the
same script as before. Now the problem with "W. Ellis Penning" is gone
(facet returns count=4 as it should), but Mr. "Alexander Pasko"
returns facet count 4 while a direct query count=6.
So the issue seems to pop up randomly, with different records.
LTVP, I'll try sending you the dataset and the script privately
(unfortunately it's not public). Could you please try replicating
this? It's driving me crazy.
Apart from the dataset, let's keep the discussion here, in public.
Would be great if you can gist a recreation of your problem, as described athttp://www.elasticsearch.org/help/
I'm interested to investigate it further
LTVP
On May 8, 2012, at 6:10 PM, Crwe wrote:
Hello LTVP,
thanks for the quick reply!
In the term facet, you asked for 'author.untouched' version of 'author' field. So, no analyzer will be applied, and you get facet values on whole field.
In the query_string, you queried on 'author' field, which is 'author.author' version in your multi_field declaration. The standard analyzer will be applied in this case.
So, the query input "W. Ellis Penning" will be broken into tokens: "W", "Ellis", "Penning" . So, any author whose name matches ANY of these token will be consider matched.
I don't think that's it. Firstly, the query is in double quotes "",
which I understand should search for the entire phrase.
Secondly, all four returned documents indeed contain "W. Ellis
Penning" verbatim, inside the author list. Why two of them were not
counted in the facet remains a mystery to me.
Hope this helps.
Regards,
LTVP
On May 8, 2012, at 5:40 PM, Crwe wrote:
Hello all,
I am new to ES and have two questions:
I have an "author" field, which I want to be searchable using the
default analyzer (~tokens). But I also want to be able to return facet
counts for this field unanalyzed, as a whole string, no tokens.
It seems to work, but is this the right way to do it? Is there a
better way to make a field token-searchable, but also return facets
for the whole string at the same time?
I've been trying to create a minimum failing example for the past two
hours, but failed.
When I enter the same four documents (and nothing else) into a new
testing index, the facet count matches the search hit count, like I
would expect.
I also tried deleting the index and re-indexing everything, using the
same script as before. Now the problem with "W. Ellis Penning" is gone
(facet returns count=4 as it should), but Mr. "Alexander Pasko"
returns facet count 4 while a direct query count=6.
So the issue seems to pop up randomly, with different records.
LTVP, I'll try sending you the dataset and the script privately
(unfortunately it's not public). Could you please try replicating
this? It's driving me crazy.
Apart from the dataset, let's keep the discussion here, in public.
Would be great if you can gist a recreation of your problem, as described athttp://www.elasticsearch.org/help/
I'm interested to investigate it further
LTVP
On May 8, 2012, at 6:10 PM, Crwe wrote:
Hello LTVP,
thanks for the quick reply!
In the term facet, you asked for 'author.untouched' version of 'author' field. So, no analyzer will be applied, and you get facet values on whole field.
In the query_string, you queried on 'author' field, which is 'author.author' version in your multi_field declaration. The standard analyzer will be applied in this case.
So, the query input "W. Ellis Penning" will be broken into tokens: "W", "Ellis", "Penning" . So, any author whose name matches ANY of these token will be consider matched.
I don't think that's it. Firstly, the query is in double quotes "",
which I understand should search for the entire phrase.
Secondly, all four returned documents indeed contain "W. Ellis
Penning" verbatim, inside the author list. Why two of them were not
counted in the facet remains a mystery to me.
Hope this helps.
Regards,
LTVP
On May 8, 2012, at 5:40 PM, Crwe wrote:
Hello all,
I am new to ES and have two questions:
I have an "author" field, which I want to be searchable using the
default analyzer (~tokens). But I also want to be able to return facet
counts for this field unanalyzed, as a whole string, no tokens.
It seems to work, but is this the right way to do it? Is there a
better way to make a field token-searchable, but also return facets
for the whole string at the same time?
Kimchy's comment in the ticket https://github.com/elasticsearch/elasticsearch/issues/1305 makes sense. I quote it here:
"Right, the way top N facets work now is by getting the top N from each shard, and merging the results. This can give inaccurate results. The phase 3 thingy is not really a solution, will read the paper though :)"
I think the root cause of the problem is the naiive top N facet merging.
In this query:
I've been trying to create a minimum failing example for the past two
hours, but failed.
When I enter the same four documents (and nothing else) into a new
testing index, the facet count matches the search hit count, like I
would expect.
I also tried deleting the index and re-indexing everything, using the
same script as before. Now the problem with "W. Ellis Penning" is gone
(facet returns count=4 as it should), but Mr. "Alexander Pasko"
returns facet count 4 while a direct query count=6.
So the issue seems to pop up randomly, with different records.
LTVP, I'll try sending you the dataset and the script privately
(unfortunately it's not public). Could you please try replicating
this? It's driving me crazy.
Apart from the dataset, let's keep the discussion here, in public.
Would be great if you can gist a recreation of your problem, as described athttp://www.elasticsearch.org/help/
I'm interested to investigate it further
LTVP
On May 8, 2012, at 6:10 PM, Crwe wrote:
Hello LTVP,
thanks for the quick reply!
In the term facet, you asked for 'author.untouched' version of 'author' field. So, no analyzer will be applied, and you get facet values on whole field.
In the query_string, you queried on 'author' field, which is 'author.author' version in your multi_field declaration. The standard analyzer will be applied in this case.
So, the query input "W. Ellis Penning" will be broken into tokens: "W", "Ellis", "Penning" . So, any author whose name matches ANY of these token will be consider matched.
I don't think that's it. Firstly, the query is in double quotes "",
which I understand should search for the entire phrase.
Secondly, all four returned documents indeed contain "W. Ellis
Penning" verbatim, inside the author list. Why two of them were not
counted in the facet remains a mystery to me.
Hope this helps.
Regards,
LTVP
On May 8, 2012, at 5:40 PM, Crwe wrote:
Hello all,
I am new to ES and have two questions:
I have an "author" field, which I want to be searchable using the
default analyzer (~tokens). But I also want to be able to return facet
counts for this field unanalyzed, as a whole string, no tokens.
It seems to work, but is this the right way to do it? Is there a
better way to make a field token-searchable, but also return facets
for the whole string at the same time?
If I understand correctly, I have three options now (displaying wrong
counts to users is a show stopper for me, no option):
ask for a huge ".size" in the facet query options, trim to top 10
manually, and hope the issue doesn't come up anymore
use only one shard
instead of relying on counts coming from facets, run another direct
query for each of the top 10 facet values, and use those hit counts as
the facet count.
Nr. 3 means ten extra ES requests (performance hit? how big?)
Nr. 2 I'm not sure what would mean re. performance and maintenance.
Kimchy's comment in the tickethttps://github.com/elasticsearch/elasticsearch/issues/1305makes sense. I quote it here:
"Right, the way top N facets work now is by getting the top N from each shard, and merging the results. This can give inaccurate results. The phase 3 thingy is not really a solution, will read the paper though :)"
I think the root cause of the problem is the naiive top N facet merging.
In this query:
I've been trying to create a minimum failing example for the past two
hours, but failed.
When I enter the same four documents (and nothing else) into a new
testing index, the facet count matches the search hit count, like I
would expect.
I also tried deleting the index and re-indexing everything, using the
same script as before. Now the problem with "W. Ellis Penning" is gone
(facet returns count=4 as it should), but Mr. "Alexander Pasko"
returns facet count 4 while a direct query count=6.
So the issue seems to pop up randomly, with different records.
LTVP, I'll try sending you the dataset and the script privately
(unfortunately it's not public). Could you please try replicating
this? It's driving me crazy.
Apart from the dataset, let's keep the discussion here, in public.
Would be great if you can gist a recreation of your problem, as described athttp://www.elasticsearch.org/help/
I'm interested to investigate it further
LTVP
On May 8, 2012, at 6:10 PM, Crwe wrote:
Hello LTVP,
thanks for the quick reply!
In the term facet, you asked for 'author.untouched' version of 'author' field. So, no analyzer will be applied, and you get facet values on whole field.
In the query_string, you queried on 'author' field, which is 'author.author' version in your multi_field declaration. The standard analyzer will be applied in this case.
So, the query input "W. Ellis Penning" will be broken into tokens: "W", "Ellis", "Penning" . So, any author whose name matches ANY of these token will be consider matched.
I don't think that's it. Firstly, the query is in double quotes "",
which I understand should search for the entire phrase.
Secondly, all four returned documents indeed contain "W. Ellis
Penning" verbatim, inside the author list. Why two of them were not
counted in the facet remains a mystery to me.
Hope this helps.
Regards,
LTVP
On May 8, 2012, at 5:40 PM, Crwe wrote:
Hello all,
I am new to ES and have two questions:
I have an "author" field, which I want to be searchable using the
default analyzer (~tokens). But I also want to be able to return facet
counts for this field unanalyzed, as a whole string, no tokens.
It seems to work, but is this the right way to do it? Is there a
better way to make a field token-searchable, but also return facets
for the whole string at the same time?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.