Highlight works on doc_text but not on doc.text for non-stored fields


(andym) #1

Hi,
I am running into a strange problem when highlight appears not to work
on fields that have “.” in them when field is not stored.

Repro steps:

  1. Install ES 0.19.2

  2. Create index, (say called h)

  3. Put mapping (as h/xx/_mapping )
    {
    "xx": {
    "_all" : {"enabled" : "false" },
    "_source" : {"compress" : "true" },
    "properties": {
    "doc_text": { "type": "string", "store": "no", "index": "analyzed",
    "include_in_all": "false" },
    "doc.text": { "type": "string", "store": "no", "index":
    "analyzed", "include_in_all": "false" }
    }
    }
    }

  4. create a simple document with 2 fields, one with "." in field name,
    another without (PUT into h/xx/1)
    {
    "doc_text": "hello world",
    "doc.text": "hello world"
    }

  5. issue a search query for doc_text field with highlight for doc_text
    field, observe that results return with highlight (POST into h/xx/
    _search)
    {
    "query": {
    "query_string": {
    "default_field": "doc_text",
    "query": "world"
    }
    },
    "highlight": {
    "pre_tags": [ "" ],
    "post_tags": ["
    " ],
    "fields": {
    "doc_text": {
    "fragment_size": 99999,
    "number_of_fragments": 9
    }
    }
    }
    }

  6. issue a search query for doc.text field with highlight for doc.text
    field, observe that results return without highlight (POST into h/xx/
    _search)
    {
    "query": {
    "query_string": {
    "default_field": "doc.text",
    "query": "world"
    }
    },
    "highlight": {
    "pre_tags": [ "" ],
    "post_tags": ["
    " ],
    "fields": {
    "doc.text": {
    "fragment_size": 99999,
    "number_of_fragments": 9
    }
    }
    }
    }

Is this a known issue (I searched forums but could not find anything)
or am I doing something wrong?
Note that if in step 2 I specify fields to be stored,as below
everything works as expected.

{
"xx": {
"_all" : {"enabled" : "false" },
"_source" : {"compress" : "true" },
"properties": {
"doc_text": { "type": "string", "store": "yes", "index": "analyzed",
"include_in_all": "false" },
"doc.text": { "type": "string", "store": "yes", "index":
"analyzed", "include_in_all": "false" }
}
}
}


(Shay Banon) #2

Can you open an issue? We realy on "." to navigate into json objects, but
we can improve to work well in your case.

On Wed, Apr 18, 2012 at 1:42 AM, andym imwellnow@gmail.com wrote:

Hi,
I am running into a strange problem when highlight appears not to work
on fields that have “.” in them when field is not stored.

Repro steps:

  1. Install ES 0.19.2

  2. Create index, (say called h)

  3. Put mapping (as h/xx/_mapping )
    {
    "xx": {
    "_all" : {"enabled" : "false" },
    "_source" : {"compress" : "true" },
    "properties": {
    "doc_text": { "type": "string", "store": "no", "index":
    "analyzed",
    "include_in_all": "false" },
    "doc.text": { "type": "string", "store": "no", "index":
    "analyzed", "include_in_all": "false" }
    }
    }
    }

  4. create a simple document with 2 fields, one with "." in field name,
    another without (PUT into h/xx/1)
    {
    "doc_text": "hello world",
    "doc.text": "hello world"
    }

  5. issue a search query for doc_text field with highlight for doc_text
    field, observe that results return with highlight (POST into h/xx/
    _search)
    {
    "query": {
    "query_string": {
    "default_field": "doc_text",
    "query": "world"
    }
    },
    "highlight": {
    "pre_tags": [ "" ],
    "post_tags": ["
    " ],
    "fields": {
    "doc_text": {
    "fragment_size": 99999,
    "number_of_fragments": 9
    }
    }
    }
    }

  6. issue a search query for doc.text field with highlight for doc.text
    field, observe that results return without highlight (POST into h/xx/
    _search)
    {
    "query": {
    "query_string": {
    "default_field": "doc.text",
    "query": "world"
    }
    },
    "highlight": {
    "pre_tags": [ "" ],
    "post_tags": ["
    " ],
    "fields": {
    "doc.text": {
    "fragment_size": 99999,
    "number_of_fragments": 9
    }
    }
    }
    }

Is this a known issue (I searched forums but could not find anything)
or am I doing something wrong?
Note that if in step 2 I specify fields to be stored,as below
everything works as expected.

{
"xx": {
"_all" : {"enabled" : "false" },
"_source" : {"compress" : "true" },
"properties": {
"doc_text": { "type": "string", "store": "yes", "index":
"analyzed",
"include_in_all": "false" },
"doc.text": { "type": "string", "store": "yes", "index":
"analyzed", "include_in_all": "false" }
}
}
}


(Shay Banon) #3

Opened na issue: https://github.com/elasticsearch/elasticsearch/issues/1875.

On Thu, Apr 19, 2012 at 4:40 PM, Shay Banon kimchy@gmail.com wrote:

Can you open an issue? We realy on "." to navigate into json objects, but
we can improve to work well in your case.

On Wed, Apr 18, 2012 at 1:42 AM, andym imwellnow@gmail.com wrote:

Hi,
I am running into a strange problem when highlight appears not to work
on fields that have “.” in them when field is not stored.

Repro steps:

  1. Install ES 0.19.2

  2. Create index, (say called h)

  3. Put mapping (as h/xx/_mapping )
    {
    "xx": {
    "_all" : {"enabled" : "false" },
    "_source" : {"compress" : "true" },
    "properties": {
    "doc_text": { "type": "string", "store": "no", "index":
    "analyzed",
    "include_in_all": "false" },
    "doc.text": { "type": "string", "store": "no", "index":
    "analyzed", "include_in_all": "false" }
    }
    }
    }

  4. create a simple document with 2 fields, one with "." in field name,
    another without (PUT into h/xx/1)
    {
    "doc_text": "hello world",
    "doc.text": "hello world"
    }

  5. issue a search query for doc_text field with highlight for doc_text
    field, observe that results return with highlight (POST into h/xx/
    _search)
    {
    "query": {
    "query_string": {
    "default_field": "doc_text",
    "query": "world"
    }
    },
    "highlight": {
    "pre_tags": [ "" ],
    "post_tags": ["
    " ],
    "fields": {
    "doc_text": {
    "fragment_size": 99999,
    "number_of_fragments": 9
    }
    }
    }
    }

  6. issue a search query for doc.text field with highlight for doc.text
    field, observe that results return without highlight (POST into h/xx/
    _search)
    {
    "query": {
    "query_string": {
    "default_field": "doc.text",
    "query": "world"
    }
    },
    "highlight": {
    "pre_tags": [ "" ],
    "post_tags": ["
    " ],
    "fields": {
    "doc.text": {
    "fragment_size": 99999,
    "number_of_fragments": 9
    }
    }
    }
    }

Is this a known issue (I searched forums but could not find anything)
or am I doing something wrong?
Note that if in step 2 I specify fields to be stored,as below
everything works as expected.

{
"xx": {
"_all" : {"enabled" : "false" },
"_source" : {"compress" : "true" },
"properties": {
"doc_text": { "type": "string", "store": "yes", "index":
"analyzed",
"include_in_all": "false" },
"doc.text": { "type": "string", "store": "yes", "index":
"analyzed", "include_in_all": "false" }
}
}
}


(andym) #4

Hi Shay,

Thanks for the fix!

The reason we are relying on “.” in field name (and now considering
changing to something else as there seems to be other problem (i.e.
accessing field through source as in _source.doc.text)) is that we
ended up having two indexes with essentially duplicated data – the
first index contains three level nested documents (i.e. foo.bar.edu)
where’s the second index is flat and contains leaves with all fields
from parent and grandparent duplicated in every leaf. To keep the
names the same between the indexes we have placed dots to concatenate
field names for the "flat" index.

The reason we need that “flat” index is that in the search results
need only leafs that match conditions (as opposed to all leafs) and we
cannot get it at the moment from our “nested” index (and we need
nested index, because we need to have docs “grouped” by parents).

The idea is to keep this “flat” index till Lucene 3.6/4.0 makes a
release and the release makes its way into Elastic Search – in
addition to BlockJoinQuery() (which ES uses for “nested” queries) they
have added support for query which works in other direction
(ToChildBlockJoinQuery() http://blog.mikemccandless.com/2012/01/tochildblockjoinquery-in-lucene.html)

The reason I bring this up is that we have partially back-ported
ToChildBlockJoinQuery() into recent fork of ES but ultimately
abandoned the effort and went for the second index instead (it’s not
quite clear what is the best way to expose results from this query
through ES JSON, then there could be other issues we don’t know about,
plus the overhead of extra index is not that big). In any case,
porting of ToChildBlockJoinQuery() likely will have to be done no
matter what once ES chooses to move to next version of Lucene, and I’d
be happy to contribute that porting work through sharing it on github
– let me know if you’re interested.

Thanks,

-- Andy

On Apr 19, 10:04 am, Shay Banon kim...@gmail.com wrote:

Opened na issue:https://github.com/elasticsearch/elasticsearch/issues/1875.

On Thu, Apr 19, 2012 at 4:40 PM, Shay Banon kim...@gmail.com wrote:

Can you open an issue? We realy on "." to navigate into json objects, but
we can improve to work well in your case.

On Wed, Apr 18, 2012 at 1:42 AM, andym imwell...@gmail.com wrote:

Hi,
I am running into a strange problem when highlight appears not to work
on fields that have “.” in them when field is not stored.

Repro steps:

  1. Install ES 0.19.2
  2. Create index, (say called h)
  3. Put mapping (as h/xx/_mapping )
    {
    "xx": {
    "_all" : {"enabled" : "false" },
    "_source" : {"compress" : "true" },
    "properties": {
    "doc_text": { "type": "string", "store": "no", "index":
    "analyzed",
    "include_in_all": "false" },
    "doc.text": { "type": "string", "store": "no", "index":
    "analyzed", "include_in_all": "false" }
    }
    }
    }
  1. create a simple document with 2 fields, one with "." in field name,
    another without (PUT into h/xx/1)
    {
    "doc_text": "hello world",
    "doc.text": "hello world"
    }
  1. issue a search query for doc_text field with highlight for doc_text
    field, observe that results return with highlight (POST into h/xx/
    _search)
    {
    "query": {
    "query_string": {
    "default_field": "doc_text",
    "query": "world"
    }
    },
    "highlight": {
    "pre_tags": [ "" ],
    "post_tags": ["
    " ],
    "fields": {
    "doc_text": {
    "fragment_size": 99999,
    "number_of_fragments": 9
    }
    }
    }
    }
  1. issue a search query for doc.text field with highlight for doc.text
    field, observe that results return without highlight (POST into h/xx/
    _search)
    {
    "query": {
    "query_string": {
    "default_field": "doc.text",
    "query": "world"
    }
    },
    "highlight": {
    "pre_tags": [ "" ],
    "post_tags": ["
    " ],
    "fields": {
    "doc.text": {
    "fragment_size": 99999,
    "number_of_fragments": 9
    }
    }
    }
    }

Is this a known issue (I searched forums but could not find anything)
or am I doing something wrong?
Note that if in step 2 I specify fields to be stored,as below
everything works as expected.

{
"xx": {
"_all" : {"enabled" : "false" },
"_source" : {"compress" : "true" },
"properties": {
"doc_text": { "type": "string", "store": "yes", "index":
"analyzed",
"include_in_all": "false" },
"doc.text": { "type": "string", "store": "yes", "index":
"analyzed", "include_in_all": "false" }
}
}
}


(system) #5