Dear List
I'm quite confused about the index_name option in the mappings definition.
While I think I understand its purpose and I can see its effect I'd like to
understand more about how it works internally and what is an alias and what
instead is a real entity.
I read the documentation of index_name on the object type description [1]
and the one in the core types definition [2] and few mailing posts, most
notably [3]
In [2] the documentation says the following
index_name: "The name of the field that will be stored in the index.
Defaults to the property/field name"
Does this mean that the value of index_name is the real identifier used
internally by ES to identify the lucene index in which the field is
actually stored?
As an example
- create an index
curl -X POST 'http://localhost:9200/other_index'
- and add few interesting types to it
curl -X PUT -d
'{"integer_type":{"dynamic":"false","properties":{"xyz":{"type":"integer",
"index_name":"integer_xyz"}, "name":{"type": "string"}}}}'
'http://localhost:9200/other_index/integer_type/_mapping'
curl -X PUT -d
'{"geo_type":{"dynamic":"false","properties":{"xyz":{"type":"geo_point",
"index_name":"geo_xyz"}, "name":{"type": "string"}}}}'
'http://localhost:9200/other_index/geo_type/_mapping'
curl -X PUT -d
'{"string_type":{"dynamic":"false","properties":{"xyz":{"type":"string",
"index_name":"string_xyz"}, "name":{"type": "string"}}}}'
'http://localhost:9200/other_index/string_type/_mapping'
curl -X PUT -d
'{"xyz":{"dynamic":"false","properties":{"woodoo":{"type":"string",
"index_name":"string_xyz"}, "name":{"type": "string"}}}}'
'http://localhost:9200/other_index/xyz/_mapping'
At this point I have 4 types integer_type, geo_type, string_type and xyz
Interesting facts about these types
integer_type, geo_type, string_type all have a xyz property but the
datatype of the property is respectively an integer a geo poing and a
string, to work around this problem the xyz property has been defined with
a different index_name so that even being incompatible types they don't
step on each other feet
Now let's see some queries
a) curl -X POST -d '{"query":{"term":{"xyz":1}}}'
'http://localhost:9200/other_index/_search'
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits:[{"_index":"other_index","_type":"integer_type","_id":"J7G7QZHsTpmjSNB66L9Vow","_score":1.0,
"_source" : {"xyz": 1, "name": "the integer name"}}]}}
So this (obviously ill conceived and dangerous) search seem to just hit the
first definition of the "xyz" field which was the integer one and returns
only the integer result
Let's see some saner queries
b) curl -X POST -d '{"query":{"term":{"string_type.xyz":1}}}'
'http://localhost:9200/other_index/_search'
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.625,"hits":[{"_index":"other_index","_type":"string_type","_id":"TZvWm1NCQYmbuv-NIKBZYw","_score":0.625,
"_source" : {"xyz": "this is a string 1", "name": "the string name"}}]}}
As expected I get back 1 result from the string type since is explicitly
required in the query
c) curl -X POST -d '{"query":{"term":{"integer_type.xyz":1}}}'
'http://localhost:9200/other_index/_search'
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"other_index","_type":"integer_type","_id":"J7G7QZHsTpmjSNB66L9Vow","_score":1.0,
"_source" : {"xyz": 1, "name": "the integer name"}}]}}
Again as expected the integer type document, as specified in teh query
d) curl -X POST -d '{"query":{"term":{"xyz.woodoo":1}}}'
'http://localhost:9200/other_index/_search'
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.625,"hits":[{"_index":"other_index","_type":"xyz","_id":"H1DZL3-zTbSuLYdaruanPw","_score":0.625,
"_source" : {"woodoo": "woodoo 1", "name": "the woodoo name"}}]}}
this is my xyz type document, as expected but I have a question, how does
ES lookup xyz as a type?
In query 'a' I used xyz as a field, how does ES decides and what is the
lookup order to decide between types and fields while resolving queries?
e) curl -X POST -d '{"query":{"term":{"string_xyz":1}}}'
'http://localhost:9200/other_index/_search'
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.625,"hits":[{"_index":"other_index","_type":"string_type","_id":"TZvWm1NCQYmbuv-NIKBZYw","_score":0.625,
"_source" : {"xyz": "this is a string 1", "name": "the string
name"}},{"_index":"other_index","_type":"xyz","_id":"H1DZL3-zTbSuLYdaruanPw","_score":0.625,
"_source" : {"woodoo": "woodoo 1", "name": "the woodoo name"}}]}}
Also this looks to me as expected, since string_xyz was a value for
index_name for the type "string_type" field "xyx" and for the type "xyz"
field "woodoo" both fields are indexed in the same lucene index and both
documents are found and I'm quite happy with it, but now, just try to
delete entirely the index and start again but before declaring the 4 types
declared at the beginning declare the following type
curl -X PUT -d
'{"odd_type":{"dynamic":"false","properties":{"string_xyz":{"type":"string",
"index_name":"another_string"}, "name":{"type": "string"}}}}'
'http://localhost:9200/other_index/odd_type/_mapping'
with this type defined the query "e"
curl -X POST -d '{"query":{"term":{"string_xyz":1}}}'
'http://localhost:9200/other_index/_search'
will return no results
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
logical explanation here would be that the odd_type won a race on the
resolution of "string_xyz" and "string_xyz" is now searching the
"another_string", now could someone explain what is happening behind the
scenes, which internal entities are involved and what are the resolution
priorities in terms of names, also let's say I wanted to gain access to the
other "string_xyz" how would I reach it?
Just to explain why I'm asking this is for a double purpose, first learning
about how ES works internally and second I'm looking into possibilities of
building mappings for json documents coming from third parties with fields
that have the same name but different datatypes, I want to keep the
original json unchanged but keep control on how this maps in the ES index.
[1] http://www.elasticsearch.org/guide/reference/mapping/object-type/
[2] http://www.elasticsearch.org/guide/reference/mapping/core-types/
[3]
https://groups.google.com/forum/#!msg/elasticsearch/1bMxI0Jc8Ho/Jn60bWVgb6YJ
Thanks in advance for your patience reading this and your responses.
Paolo
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.