Ability to configure "root" field for an object type

I have a mapping definition with multiple object types that looks like:

{
    "mappings": {
        "_doc": {
            "properties": {
                "identifier": {
                    "type": "object",
                    "properties": {
                        "id": {
                            "type": "keyword"
                        },
                        "upstream": {
                            "type": "keyword"
                        }
                    }
                },
                "document": {
                    "type": "object",
                    "properties": {
                        "engine": {
                            "type": "keyword"
                        },
                        "content": {
                            "type": "text"
                        }
                    }
                }
            }
        }
    }
}

This will build fields: identifier.id, identifier.upstream, document.engine, and document.query. I would like to allow additional mappings for the "root" fields e.g. the plain "identifier" and "document" field.

For the "identifier" field I would like to be able to search across all subfields (without requiring users to specify "identifier.*") which I believe should be a simple keyword type with all subfields specifying: "copy_to": "identifier". Likewise, for the "document" field I would like to set it up as an alias field which will point to populated "document.content" field. I tried updating the "type" to the desired field types but was unable to create the index, can someone please let me know if this is achievable.

Update:
I also tried to flatten out the structure myself and do:

"properties": {
    "identifier": {
        "type": "keyword"
    },
    "identifier.id": {
        "type": "keyword",
        "copy_to": "identifier"
    },
    "identifier.upstream": {
        "type": "keyword",
        "copy_to": "identifier"
    },
    "document": {
        "type": "alias",
        "path": "document.content"
    },
    "document.engine": {
        "type": "keyword"
    },
    "document.content": {
        "type": "text"
    }
}

But received the following error:

Failed to parse mapping [_doc]: Can't merge a non object mapping [document] with an object mapping [document]

Hi,
you are already defining a mapping for identifier and document, and that is an object datatype. You should consider using copy_to, but using a different field name than the ones already defined in the mapping. What about

...
"identifier": {
  "type": "object",
  "properties": {
    "id": {
      "type": "keyword"
      "copy_to": "identifier.*"
    },
    "upstream": {
      "type": "keyword"
      "copy_to": "identifier.*"
    }
  }
},
...

That you can query this way


GET your_index/_search
{
  "query": {
    "match": {
      "identifier.*": "the_id"
    }
  }
}

?

So, the primary reason why I am doing this is allow users to perform logical fielded searches rather than having to know all of the various subfields (though it would be available to them if they really needed it). I am afraid that if I ask users to user to run their identifier searches via "identifier.*" it is leaking too many details of the underlying indexing scheme which I would like to avoid.

I wanted to get clarification on if this is possible in Elasticsearch or not (it seems like it is not) -- though this seems to be a limitation on Elasticsearch itself and not Lucene as the underlying Lucene index structure can accommodate a "root" field index type.

Since this request is primarily focused around abstracting the underlying index structure from the user (via query_string queries), another option would be to dynamically map fields at query time. Is there a way to indicate to the Elasticsearch query parser to map query fields of "document" to "document.content" or "identifier" to "identifier.*"?

I'm not sure about what information is leaking, as you'll expose the underlying structure anyway by allowing queries on identifier.id and identifier.upstream.

Sure, if we documented the fact that those fields are available then you are correct we would be leaking the information. Now, if we say the only "supported" field that they could search on is simply "identifier" then that provides us flexibility in the future while also not exposing that those underlying fields are actually there.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.