Difference on dotted notation vs dictionary

oskrdt · February 5, 2021, 6:31pm

I'm using a python script to populate data to elasticsearch using pandas to transform the data before the population.
I was creating columns on pandas using dotted notation expecting that the elasticsearch index process wold transform the dotted notation into JSON dictionary, but it seems the fields are contained as individual fields on the _source called by the same dotted notation.

Example: I have the following on python

{
    "user.name": "Username",
    "user.email": "email@domain.com"
}

and I was expecting the index process to transform it to

{
    "user": {
        "name": "Username",
        "email": "email@domain.com"
}

Is there any real difference having individual dotted fields or nested dictionaries on _source?

wylie · February 5, 2021, 6:34pm

You are allowed to use both formats. For fields that get indexed as docvalues, like keyword or numeric types, they will get indexed into docvalues under the dotted path regardless of how _source looks.

The only difference it makes is if you are reading from _source- because _source is the raw data that you sent to Elasticsearch, it's not transformed.

Edit: If you are interested in changing the format automatically, you can set up an ingestion pipeline which uses the dot expand processor

stephenb · February 6, 2021, 3:00am

Hi @oskrdt

Just a comment.

As @wylie noted elasticsearch supports both formats, however as someone in the field that sees lots of data and lots of tools etc.. I would encourage you to look at the dot expander especially if you want to do any further processing in elasticsearch. Also I have found the sub-object (technically not nested, it is an object ) to be more portable / cause less confusion.

system · March 6, 2021, 3:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.