we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!
but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],
the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).
any ideas why?
Thanks a lot audn keep up with the great work!
MAX
-shay.banon
On Tuesday, January 18, 2011 at 2:48 PM, max wrote:
Hi,
we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!
but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],
the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).
any ideas why?
Thanks a lot audn keep up with the great work!
MAX
we did a "./gradlew clean release" and installed this 0.15.0-SNAPSHOT
and still have the same problem. sometimes the highlight is there,
sometimes not, any ideas why?
On Tuesday, January 18, 2011 at 2:48 PM, max wrote:
Hi,
we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!
but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],
the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).
any ideas why?
Thanks a lot audn keep up with the great work!
MAX
how do you analyze the HTML source text and how do you store it in ES? Are
you sure you removed all html tags before highlighting takes action?
Regards,
Lukas
On Fri, Jan 21, 2011 at 11:06 AM, max max@kossatz.com wrote:
hi,
thank you for the fast response!
we did a "./gradlew clean release" and installed this 0.15.0-SNAPSHOT
and still have the same problem. sometimes the highlight is there,
sometimes not, any ideas why?
On Tuesday, January 18, 2011 at 2:48 PM, max wrote:
Hi,
we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!
but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],
the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).
any ideas why?
Thanks a lot audn keep up with the great work!
MAX
yes, all html-tags removed.
the "funny" thing is: for the same document depending on the
searchterm the highlight is or is not in the search-result. the
document itself is always found.
we did a "./gradlew clean release" and installed this 0.15.0-SNAPSHOT
and still have the same problem. sometimes the highlight is there,
sometimes not, any ideas why?
On Tuesday, January 18, 2011 at 2:48 PM, max wrote:
Hi,
we are using elasticsearch for indexing a lot of html-documents (right
now around 90k, indexsize is around 2GB). everything is great, we love
elasticsearch!
but we noticed one thing we cannot understand. we use the following
search:
{
"from" : 0, "size" : 10,
"sort" : [
{"time" : "desc"}
],
the search finds the right documents, but the highlight-field is
sometimes filled, sometimes empty (the text-field ist stored as a
field too, as we had to do it before 0.14.0). if i search for
different words and find the same documents for some words the
highlight is there, for some words not (in the same document we find).
any ideas why?
Thanks a lot audn keep up with the great work!
MAX
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.