Internals of array of strings vs. concatenated string

mitar · December 4, 2020, 9:33am

I am trying to better understand internals of ElasticSearch, so I would like to know if there are any differences in how ElasticSearch internally computes term statistics for the following two cases.

The first case is when I have documents like:

{
  "foo": [
    {
      "bar": "long string"
    },
    {
      "bar": "another long string"
    }
  ]
}

Or a document like:

{
  "foobar": "long string another long string"
}

My understanding is that the first document gets flattened to:

{
  "foo.bar": ["long string", "another long string"]
}

So it seems the question is really, is the second and third documents indexed the same? Is term statistics computed the same?

mitar · December 17, 2020, 5:59pm

I got an answer to it on Stack Overflow.

system · January 14, 2021, 5:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Way to store extracted list of terms in elasticsearch (array, text, ...) Elasticsearch	1	566	July 5, 2017
Term and string Elasticsearch	4	448	July 6, 2017
Inconsistent sum_doc_freq and sum_ttf numbers in _mtermvectors Elasticsearch	2	512	May 25, 2018
Lucene vs Elastic Search Document Count difference and its impact on term aggregation buckets Elasticsearch	10	680	August 20, 2023
Indexing and searching of flattened Elasticsearch	0	53	October 4, 2024

Internals of array of strings vs. concatenated string

Related topics