Elastic mapping flattened vs field wise

In our MongoDB if we have a collection where we have multiple field types like array, multi-level object, strings, numbers, dates etc.
sample of a nested field/object is like - "ValuesDocument ":{"field1":"Value1","field2":{"$numberLong":"value2"},"field3":"value3"}

What is the best way to store it in Elastic like a flattened structure for better searching or have individual field mapping like below? What are your thoughts? pros and cons of each if any?

{
"mappings": {
"properties": {
"_id": {
"type": "keyword"
},
"name": {
"type": "text"
}
"fieldA": {
"type": "float"
},
"fieldB": {
"type": "text"
},
"fieldC": {
"type": "keyword"
},
"multiLevelDocument": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}

I would say that it is best to have each field mapped with the correct data type, the flattened data type should be used when you have an object with a lot of unkown keys.

It should not be used to map the entire document as mentioned in the documentation.

The flattened mapping type should not be used for indexing all document content, as it treats all values as keywords and does not provide full search functionality. The default approach, where each subfield has its own entry in the mappings, works well in the majority of cases.

Thanks @leandrojmp for the reply and justification. So if I have a field which is a object and it can have any dynamic keys based on how the user configures it from UI. Also I should be value to do full text search on some of the keys/values in that object, how do I go about it.
e.g. see below

ValuesDocument: Object
	key1 "200"
	key 2 "CAP-1"
	key3
	"[
	  {
	    "userId": 1,
	    "id": 1,
	    "title": "how are you"

ValuesDocument: Object

	key Object
		_t    "ValueSelection"
		_id   "abcd"
		Value "1"
		v      "xyz"

Assuming I have a document as below indexed in elastic

POST /test_index8/_doc
{
  "name": "Document 1",
  "fieldA": 123.45,
  "fieldB": "Sample text",
  "fieldC": "keyword_value",
  "ValuesDocument": {
    "al6gq": "Group - Corp",
    "acvm8": "Technology",
    "ajkc5": "Data",
    "63c095bebb3a6b02afcb2cbd":  "ABC-369",
    "aucmr" : "he is cute\nhe is naughty\nhe is smart",
    "ab4jh" : "aarav@gmail.com",
    "awb6p" : "1.2.3.4",
    "afl2f" : {
        "_t":    "ValueSelection",
        "_id":   "abc",
        "Value": "1",
        "v":     "pqr"
    }
  }
}

And some of the possible search queries from UI could be as below. FYI in my app code I might not be able to specify exact field inside the ValuesDocument object to search against, and ES should be able to search across all.

  • how do I search for contains "Group"
  • how do I search for exact match for "ABC-369"
  • how do I search for contains "ValueSelection"
  • how do I search for contains "naughty"
    I tried both the mappings below for ValuesDocument and search queries don't return data as expected.
OPTION 1
{
  "mappings": {
    "properties": {
      "ValuesDocument": {
        "type": "object",  // Use object type for dynamic structures
        "dynamic": true   // Allow Elasticsearch to dynamically map inner fields
      }
    }
  }
}
OPTION 2
{
  "mappings": {
    "properties": {
      "ValuesDocument": {
        "type": "nested" // Use nested type for arrays or deeply nested structures
      }
    }
  }
}

or maybe a 3rd option to just flatten it as below?

"ValuesDocument": {
        "type": "text"
      }

PLEASE suggest as I am really finding it hard to decide

Hi Elastic Team,

Can someone reply to this issue. We are very confused on what the correct mapping for this dynamic object type should be because the data that comes in to our valuesdocument object in mongoDB is very dynamic. we are going to move this data from mongo to elastic and I am looking for the right mapping that helps both storage and retrieval as we have search usecase and we need to be able to do both exact matches, full text search, contains etc.
The problem with the below option is even though it matches the existing mongoDB storage, I would be cautious using dynamic feature in a production environment, especially if an index is storing records across tenants. The first document indexed defines the mapping and any other documents that do not confirm to expected data types for the mapping will be rejected. This will be especially apparent if there is ValuesDocument field collisions that have the same short id but are different data types

OPTION 1
{
  "mappings": {
    "properties": {
      "ValuesDocument": {
        "type": "object",  // Use object type for dynamic structures
        "dynamic": true   // Allow Elasticsearch to dynamically map inner fields
      }
    }
  }
}

from our app basically I can bring in a text field, a number field, date field, large comments and attachment fields and they get mapped to the valuesdocument object in mongo. a sample below

POST /test_index/_doc
{
  "name": "Document 1",
  "fieldA": 123.45,
  "fieldB": "Sample text",
  "fieldC": "keyword_value",
  "ValuesDocument": {
    "al6gq": "true",
    "63c095bebb3a6b02afcb2cbd":  "ABC-369",
    "ab4jh" : "anc@gmail.com",
    "awb6p" : "1.2.3.4",
    "afl2f" : {
        "_t":    "ValueSelection",
        "_id":   "abc",
        "Value": "test string",
        "v":     "xyz"
    }
  }
} 

Example of a dynamic field getting mapped as a bool first then the next document field is a string

{
    "error": {
        "root_cause": [
            {
                "type": "document_parsing_exception",
                "reason": "[1:26] failed to parse field [ValuesDocument.abc] of type [boolean] in document with id 'f7_ACpEBMp0FuQgbXF1U'. Preview of field's value: 'string'"
            }
        ],
        "type": "document_parsing_exception",
        "reason": "[1:26] failed to parse field [ValuesDocument.abc] of type [boolean] in document with id 'f7_ACpEBMp0FuQgbXF1U'. Preview of field's value: 'string'",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Failed to parse value [string] as only [true] or [false] are allowed."
        }
    },
    "status": 400
}

an even more likely case is when a string is detected as a date the first time, but the next document it actually isn’t a date

{
    "error": {
        "root_cause": [
            {
                "type": "document_parsing_exception",
                "reason": "[1:26] failed to parse field [ValuesDocument.def] of type [date] in document with id 'gb_BCpEBMp0FuQgbZF3f'. Preview of field's value: 'not a date'"
            }
        ],
        "type": "document_parsing_exception",
        "reason": "[1:26] failed to parse field [ValuesDocument.def] of type [date] in document with id 'gb_BCpEBMp0FuQgbZF3f'. Preview of field's value: 'not a date'",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "failed to parse date field [not a date] with format [strict_date_optional_time||epoch_millis]",
            "caused_by": {
                "type": "date_time_parse_exception",
                "reason": "Failed to parse with all enclosed parsers"
            }
        }
    },
    "status": 400
}

Sorry, didn't saw your last questions in this thread.

A couple of things to try to make it clear.

First, Elasticsearch is not a a direct replacement for MongoDB, they work in different ways, MongoDB is schemaless, Elasticsearch is not, you need to have a schema (mappings) defined as this can impact in performance, storage usage and how you will search your data.

If you do not define a mapping for your data before indexing it, Elasticsearch per default will try to infer the data type of a field on the first time it receives a document with the field, this works in some way for string fields, but it may cause issues for other data types like booleans, numeric data types and specially objects.

A wrong mapping leads to the error you shared, a mapping conflict, which makes Elasticsearch drop the documents when the value for a field does not match the expected mapping.

As mentioned in this previous comment, when you have a field that is highly dynamic, the solution is to map this field as flattened, this will allow Elasticsearch to store the entire json, but it will also map every nested field as a keyword, so you will not have full text search on this field.

One alternative to this is to also store those dynamics fields as a string, so you can at least get some kind of full text search on it, so you could end up with something like this:

On this example I used the following template:

PUT _index_template/discuss
{
  "template": {
    "settings": {
      "number_of_replicas": 0
    },
    "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "fieldA": {
          "type": "float"
        },
        "fieldB": {
          "type": "text"
        },
        "fieldC": {
          "type": "keyword"
        },
        "ValuesDocument": {
            "type": "flattened"
        },
        "ValuesDocumentAsText": {
          "type": "text"
        }
      }
    }
  },
  "index_patterns": [
    "discuss"
  ],
  "allow_auto_create": true
}

And the following sample document:

POST /discuss/_doc
{
  "name": "Document 1",
  "fieldA": 123.45,
  "fieldB": "Sample text",
  "fieldC": "keyword_value",
  "ValuesDocument": {
    "al6gq": "true",
    "63c095bebb3a6b02afcb2cbd":  "ABC-369",
    "ab4jh" : "anc@gmail.com",
    "awb6p" : "1.2.3.4",
    "afl2f" : {
        "_t":    "ValueSelection",
        "_id":   "abc",
        "Value": "test string",
        "v":     "xyz"
    }
  },
  "ValuesDocumentAsText": "\"ValuesDocument\": { \"al6gq\": \"true\", \"63c095bebb3a6b02afcb2cbd\":  \"ABC-369\", \"ab4jh\" : \"anc@gmail.com\", \"awb6p\" : \"1.2.3.4\", \"afl2f\" : {\"_t\": \"ValueSelection\", \"_id\": \"abc\", \"Value\": \"test string\", \"v\": \"xyz\"}}"
}

How you transform the ValuesDocument field in a string depends entirely on how you are indexing your data.

sorry but couple of things to clarify. So our source of truth for ACID compliance still stays MongoDB and to leverage the search capabilities we are moving relevant fields to elastic. On the mappings why do we have 2 sections - is it are you saying 1. flattened to be able to store the object with dynamic key-value fields/types correctly and 2. text is to allow full text search on the fields in this object?
So we have a C# code which will have the mapping defined defined for the index. How will the C# mapping mapping for ValuesDocument look like? something like below?

public Dictionary<string, object> ValuesDocument { get; set; }
public string ValuesDocumentAsText { get; set; }

So from my app we have to send the same data to be indexed twice?
what does the below line mean exactly?

Yes, you would have 2 fields with the same data, but one would be mapped as flattened, so it will store the entire json object an map every nested field as keyword, it would appear in the way I shared in the previous answer.

And then you would need to store the same data, but as a string in a field mapped as text, this would allow you to do text queries on this data.

Yes, the flattened data type will allow you to store the dynamic object without having to map each individual field, but each individual field will be mapped as keyword, for example if you have a nested field in this object that it is a date, it will be mapped as keyword which means that you will not be able to perform date queries on it.

Yes, but the data will be just a long string, there will be no nested fields anymore, it will just be a string as you can see in the screenshot in the previous answer.

I mean that you need to transform the ValuesDocument field into a string before sending it to Elasticsearch, for example, your ValuesDocument is a dictionary in your code, you need to copy it to another field, like ValuesDocumentAsText and transform it into a string.

No idea how you do that in C#, I create all my mappings direct in Elasticsearch, if this part of the code sends ValuesDocument as an object and ValuesDocumentAsText as a string, then it should work if the mapping is correct.

But I do not know how you create the mapping directly in C#, I prefer to do all my mappings directly in Elasticsearch.

Hey @leandrojmp sorry for the delay in response.

Just to summarise if I understand it correctly, you are proposing storing field data into Elastic twice, once as a large string of text, and then again as a flattened document. Is the string of text for regular keyword search i.e. exact matches and the flattened document for “contains” search on a single field? One storage means for each core use case?

Yes, that it is what I mentioned before, you will need to store the data twice, one field would be mapped as flattened, and the other would be mapped as text.

No, it is the other way around, in the flattened field you will have the dynamic json and can search on the nested fields for exact matches, the other field would be mapped as text and would include the json object as a string, in this field you can do full text search.

This is how Elastic does in these cases, one example is the AWS Cloudtrail data where you can have a couple of fields that are dynamic.

In the example I shared here you would use the field ValuesDocument to do keyword (exact matches) queries, and the field ValuesDocumentAsText to do full text or contains queries.