Inconsistent query behavior with multi_field and string value


(Adam Zell) #1

I am seeing inconsistent behavior in ES 0.90.7 when storing a string
directly in a multi_field, versus as a hash of attributes. Relevant gists:


The only difference in the above two gists is the JSON document published
to ES. In the former, the string is directly associated with its value:
"name" : "test me"

In the latter, the string is associated with a hash which contains its
value:

"name" : { "value" : "test me" }

Performing a search against the multi_field field without the default name
("name.untouched") returns 1 result in the former, but 0 in the latter. Is
this intended? Ref
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string
:

The string type also support custom indexing parameters associated with the
indexed value. For example:

{
"message" : {
"_value": "boosted value",
"_boost": 2.0
}
}

The mapping is required to disambiguate the meaning of the document.
Otherwise, the structure would interpret "message" as a value of type
"object". The key _value (or value) in the inner document specifies the
real string content that should eventually be indexed. The _boost (or boost)
key specifies the per field document boost (here 2.0).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7d7fef21-b8fd-4585-9ff9-6df0b4d2734d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adam Zell) #2

Copy and paste from
https://github.com/elasticsearch/elasticsearch/issues/4320:

The culprit looks to be MultiFieldMapper.parse. In theory, the code should
rewind the ParserContext to the START_OBJECT token before calling
Mapper.parse. As it is, the second and later Mapper objects receive a
ParserContext stuck at the END_OBJECT token, as the first mapper has
consumed additional tokens.

Is there a simple way to mark and reset XContentParser, similar to an
InputStream?

On Sunday, December 1, 2013 11:55:49 PM UTC-8, zellster wrote:

I am seeing inconsistent behavior in ES 0.90.7 when storing a string
directly in a multi_field, versus as a hash of attributes. Relevant gists:

https://gist.github.com/azell/7746233
https://gist.github.com/azell/7746253

The only difference in the above two gists is the JSON document published
to ES. In the former, the string is directly associated with its value:
"name" : "test me"

In the latter, the string is associated with a hash which contains its
value:

"name" : { "value" : "test me" }

Performing a search against the multi_field field without the default name
("name.untouched") returns 1 result in the former, but 0 in the latter. Is
this intended? Ref
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string:

The string type also support custom indexing parameters associated with
the indexed value. For example:

{
"message" : {
"_value": "boosted value",
"_boost": 2.0
}
}

The mapping is required to disambiguate the meaning of the document.
Otherwise, the structure would interpret "message" as a value of type
"object". The key _value (or value) in the inner document specifies the
real string content that should eventually be indexed. The _boost (or
boost) key specifies the per field document boost (here 2.0).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e0c8f638-087a-4d88-a88b-3dac2c7c78c7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3