Body optimisation with nested objects / filter

Emporea · February 1, 2022, 4:51pm

I have a working query, what I am asking is if there is a more simple way for my query.
I dont understand why using filter has to be so complicated. Am I missing something? I have such a terribly nested body.

I have an array of object as a field and declared it as nested.

2 sample rows of my document title field are this:

[
  [                                  #row 1 of document, field "title"
    { token: "the", POS: "DET" },
    { token: "life", POS: "PROPN" },
    { token: "legacy", POS: "PROPN" },
    { token: "poitier", POS: "PROPN" },
  ],
  [                                  #row 2 of document, field "title"
    { token: "life", POS: "PROPN" },
    { token: "legacy", POS: "PROPN" },
    { token: "’", POS: "PART" },
    { token: "reasoning", POS: "NOUN" },
    { token: "expulsion", POS: "PROPN" },
  ],
]

When I am aggregating I filter for the POS. In this example, I am doing a term aggregation for every token whose POS is "NOUN" or "PROPN". In the result I want to include the POS (if the tokenizer was bad it could have multiple POS per token, thats why I am also doing a term aggregation with size 0)

My response should ideally look like this:

[
  { key: "life", doc_count: 2, includePOS: "PROPN" },
  { key: "legacy", doc_count: 2, includePOS: "PROPN" },
  { key: "poitier", doc_count: 1, includePOS: "PROPN" },
  { key: "reasoning", doc_count: 1, includePOS: "NOUN" },
  { key: "expulsion", doc_count: 1, includePOS: "PROPN" },
]

But because my search query is so nested the body is reponse is also very nested. This is my query:

{
  query: {
    query_string: {
      query: "date:[now/d TO now]",
    },
  },
  aggs: {
    title: {
      nested: {
        path: "title",
      },
      aggs: {
        title: {
          filter: {
            terms: {
              "title.POS.keyword": ["NOUN", "PROPN"],
            },
          },
          aggs: {
            title: {
              terms: {
                field: "title.token.keyword",
                size: 10,
              },
              aggs: {
                includesome: {
                  terms: {
                    field: "title.POS.keyword",
                    size: 1,
                  },
                },
              },
            },
          },
        },
      },
    },
  },
}

Is there any way of writing my query more simple?

Tomo_M · February 2, 2022, 2:44am

One reason might be using nested fields. Though it may have side effects, if you use flat data structure for tokens the query will be much more simple:

{ book_id: 1, token_seq:0, token: "the", POS: "DET"}
{ book_id: 1, token_seq:1, token: "life", POS: "PROPN" }...

Emporea · February 2, 2022, 11:06am

I need to use nested fields, otherwise the connection between token and POS gets lost.

Tomo_M · February 2, 2022, 11:10am

I meant the structure that one document for one token though it is a radical change.

Emporea · February 2, 2022, 11:11am

Thats not really possible, there are alot of other fields. One Array refers to data in one doc.

Tomo_M · February 2, 2022, 12:09pm

I have no idea, sorry.
I'm not sure whether you are already using, one thing I can tell is that it is possible to filter the output by filter_path parameter.
GET /your_index/_search?filter_path=aggregations.title

system · March 2, 2022, 12:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to filter nested objects in nested aggregation Elasticsearch	3	5553	March 9, 2022
Nested object aggregation performance issues Elasticsearch	1	541	March 8, 2021
Filter Aggregation on nested object ... not working as expected! Elasticsearch	7	1861	December 25, 2020
Querying OR filtering on a nested object Elasticsearch	2	375	July 6, 2017
Filter in nested aggregations Elasticsearch	3	370	July 6, 2017

Body optimisation with nested objects / filter

Related topics