Query behavior for an index with multiple document types and mapping per doc type, bug?


(Garth) #1

Not sure if there is a bug or DHE ( Developer Headspace Error ) in how the
index handles
multiple mappings in an index. In short I have two document types,
"docType" with explicit
mapping such that "docType" is mapped differently.
If the type is "string" data is returned as expected. If it's "integer" I
have to change how
I query it to get the correct result.

Running on ES: 0.19.8

Here is my example:

Creating the index:

curl -XDELETE 'http://localhost:9200/multi_documents'
curl -XPUT 'http://localhost:9200/multi_documents'
curl -XPUT 'http://localhost:9200/multi_documents/ferret/_mapping' -d '{
"ferret":{
"properties":{
"docType":{ "type":"string" },
"ident_ferret" : { "type" : "integer" }
}
}
}'
curl -XPUT 'http://localhost:9200/multi_documents/wombat/_mapping' -d '{
"wombat":{
"properties" :{
"docType" : { "type" : "integer" },
"ident_wombat" : { "type" : "integer" }
}
}
}'

load data:

curl -XPOST 'http://localhost:9200/multi_documents/ferret/ferret-001' -d '{
"doc_id":"ferret-001",
"docType" : "ferret",
"entityType" : "animal",
"ident_ferret" : 200
}'
curl -XPOST 'http://localhost:9200/multi_documents/wombat/wombat-001' -d '{
"doc_id":"wombat-001",
"docType" : 10,
"entityType": "animal",
"ident_wombat" : 100
}'

Query the index with field of same type in both document types ( SUCCESS ).
Both
"ferret" and "wombat" document are returned.

curl -XGET 'http://localhost:9200/multi_documents/_search' -d '{
"size":100,
"query":{"field":{"entityType":"animal"}}
}'

Query across the document types ( by not defining them in URL ) asking for
a docType that is string ( SUCCESS ).
It returns the "ferret" document.

curl -XGET 'http://localhost:9200/multi_documents/_search' -d '{
"size":100,
"query":{"field":{"docType":"ferret"}}
}'

Query across the document types ( by not defining them in URL ) asking for
a docType that is integer ( FAIL ). Should be getting "wombat" document
returned.
curl -XGET 'http://localhost:9200/multi_documents/_search' -d '{
"size":100,
"query":{"field":{"docType":10}}
}'

To make the above query that fails to work, I have to explicitly define the
documentType in the field ( SUCCESS ).
It returns the "wombat" document.
curl -XGET 'http://localhost:9200/multi_documents/_search' -d '{
"size":100,
"query":{"field":{"wombat.docType":10}}
}'

Try the query above but specify the document type in URL ( FAIL ). Should
be getting
"wombat" document returned.
curl -XGET 'http://localhost:9200/multi_documents/wombat/_search' -d '{
"size":100,
"query":{"field":{"docType":10}}
}'

--


(Radu Gheorghe) #2

Hello Garth,

This seems to be a "known issue", because all the types within the
same ES index (actually, within the same shard) will end up in the
same Lucene index. Take a look here for some more info about this:
http://www.elasticsearch.org/guide/reference/mapping/

My understanding is that in your case, ES has no [good] way to know
which mapping type you're referring to when you specify "docType" as a
field within your query. Unless you specify it explicitly, like you
did with wombat.docType.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Nov 7, 2012 at 5:36 PM, Garth ghershfield.bah@gmail.com wrote:

Not sure if there is a bug or DHE ( Developer Headspace Error ) in how the
index handles
multiple mappings in an index. In short I have two document types, "docType"
with explicit
mapping such that "docType" is mapped differently.
If the type is "string" data is returned as expected. If it's "integer" I
have to change how
I query it to get the correct result.

Running on ES: 0.19.8

Here is my example:

Creating the index:

curl -XDELETE 'http://localhost:9200/multi_documents'
curl -XPUT 'http://localhost:9200/multi_documents'
curl -XPUT 'http://localhost:9200/multi_documents/ferret/_mapping' -d '{
"ferret":{
"properties":{
"docType":{ "type":"string" },
"ident_ferret" : { "type" : "integer" }
}
}
}'
curl -XPUT 'http://localhost:9200/multi_documents/wombat/_mapping' -d '{
"wombat":{
"properties" :{
"docType" : { "type" : "integer" },
"ident_wombat" : { "type" : "integer" }
}
}
}'

load data:

curl -XPOST 'http://localhost:9200/multi_documents/ferret/ferret-001' -d '{
"doc_id":"ferret-001",
"docType" : "ferret",
"entityType" : "animal",
"ident_ferret" : 200
}'
curl -XPOST 'http://localhost:9200/multi_documents/wombat/wombat-001' -d '{
"doc_id":"wombat-001",
"docType" : 10,
"entityType": "animal",
"ident_wombat" : 100
}'

Query the index with field of same type in both document types ( SUCCESS ).
Both
"ferret" and "wombat" document are returned.

curl -XGET 'http://localhost:9200/multi_documents/_search' -d '{
"size":100,
"query":{"field":{"entityType":"animal"}}
}'

Query across the document types ( by not defining them in URL ) asking for a
docType that is string ( SUCCESS ).
It returns the "ferret" document.

curl -XGET 'http://localhost:9200/multi_documents/_search' -d '{
"size":100,
"query":{"field":{"docType":"ferret"}}
}'

Query across the document types ( by not defining them in URL ) asking for a
docType that is integer ( FAIL ). Should be getting "wombat" document
returned.
curl -XGET 'http://localhost:9200/multi_documents/_search' -d '{
"size":100,
"query":{"field":{"docType":10}}
}'

To make the above query that fails to work, I have to explicitly define the
documentType in the field ( SUCCESS ).
It returns the "wombat" document.
curl -XGET 'http://localhost:9200/multi_documents/_search' -d '{
"size":100,
"query":{"field":{"wombat.docType":10}}
}'

Try the query above but specify the document type in URL ( FAIL ). Should be
getting
"wombat" document returned.
curl -XGET 'http://localhost:9200/multi_documents/wombat/_search' -d '{
"size":100,
"query":{"field":{"docType":10}}
}'

--

--


(system) #3