Complex field name and type name collision


(missinglink) #1

when performing a 'match' query as below, elasticsearch 1.7.2 is not able to distinguish between:

  • a document with _type of 'user' & a field called 'name'
  • a document of any _type with a field called 'user.name'

the following query is used as an example in all comments below:

  curl -XGET 'http://localhost:9200/testindex/_search?pretty' -d '{
    "query": {
      "match": {
        "user.name": "john"
      }
    },
    "size": 100
  }'

When I PUT the first document in the index, the query above will return the correct result testindex/guest/1:

curl -XPUT 'http://localhost:9200/testindex/guest/1?pretty' -d '{"user":{"name":"john"}}'
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "testindex",
      "_type" : "guest",
      "_id" : "1",
      "_score" : 0.30685282,
      "_source":{"user":{"name":"john"}}
    } ]
  }

however when I PUT a second document the same query now only returns one document, which is actually a different document, (testindex/user/1 is returned).

curl -XPUT 'http://localhost:9200/testindex/user/1?pretty' -d '{"user":{"name":"john"}}'
  "hits" : {
    "total" : 1,
    "max_score" : 0.5945348,
    "hits" : [ {
      "_index" : "testindex",
      "_type" : "user",
      "_id" : "1",
      "_score" : 0.5945348,
      "_source":{"user":{"name":"john"}}
    } ]
  }

When I add a third document to the index and repeat the query, I again get a single document returned and it is also different (testindex/user/2 is now returned).

curl -XPUT 'http://localhost:9200/testindex/user/2?pretty' -d '{"name":"john"}'
  "hits" : {
    "total" : 1,
    "max_score" : 1.4054651,
    "hits" : [ {
      "_index" : "testindex",
      "_type" : "user",
      "_id" : "2",
      "_score" : 1.4054651,
      "_source":{"name":"john"}
    } ]
  }

This behaviour seems counter-intuitive to me, I would expect the last query to return 3 matches, or at least 2 matches, but only one is returned.

Looking through the docs it seems to be due to how the internal keys are created for "multilevel objects", the docs say:

Inner fields can be referred to by name (for example, first). To distinguish between two fields that have the same name, we can use the full path (for example, user.name.first) or even the type name plus the path (tweet.user.name.first)

It seems like there is some confusion whether my query "user.name": "john" is asking for a nested field called user.name or a _type of user and a field of name.

As a result some data is not returned, maybe it is trying to de-duplicate the results?

I tested this against the latest HEAD of 5.0.0 and it appears to be fixed, unfortunately we are still transitioning and I'm looking for a solution before we migrate up to 2.x and beyond.

One obvious solution would be to change either the _type or the property name so that the collision no longer happens, this is not super easy for us but is our current best solution.

Another solution would be to find a way of targeting the field with an absolute path rather than a relative one, so in the query I would like to say something like "root.user.name": "john" or "*.user.name": "john", does this variable exist?

The final option I can think of is to specify the target _types in either the query or the HTTP path, such as:

curl -XGET 'http://localhost:9200/testindex/guest,user/_search?pretty'

The query above returns the correct results, however we have a dynamic list of _types and so doing this would be very difficult, as would changing the query to explicitly specify the target types.

Please let me know what you think, maybe you know of a simple solution to this issue?

I've included a full test case below followed by the output here:


(Ryan Ernst) #2

These types of resolution issues were common in 1.x, and were fixed by https://github.com/elastic/elasticsearch/issues/8872 for 2.0+.


(system) #3