Body optimisation with nested objects / filter

I have a working query, what I am asking is if there is a more simple way for my query.
I dont understand why using filter has to be so complicated. Am I missing something? I have such a terribly nested body.

I have an array of object as a field and declared it as nested.

2 sample rows of my document title field are this:

[
  [                                  #row 1 of document, field "title"
    { token: "the", POS: "DET" },
    { token: "life", POS: "PROPN" },
    { token: "legacy", POS: "PROPN" },
    { token: "poitier", POS: "PROPN" },
  ],
  [                                  #row 2 of document, field "title"
    { token: "life", POS: "PROPN" },
    { token: "legacy", POS: "PROPN" },
    { token: "’", POS: "PART" },
    { token: "reasoning", POS: "NOUN" },
    { token: "expulsion", POS: "PROPN" },
  ],
]

When I am aggregating I filter for the POS. In this example, I am doing a term aggregation for every token whose POS is "NOUN" or "PROPN". In the result I want to include the POS (if the tokenizer was bad it could have multiple POS per token, thats why I am also doing a term aggregation with size 0)

My response should ideally look like this:

[
  { key: "life", doc_count: 2, includePOS: "PROPN" },
  { key: "legacy", doc_count: 2, includePOS: "PROPN" },
  { key: "poitier", doc_count: 1, includePOS: "PROPN" },
  { key: "reasoning", doc_count: 1, includePOS: "NOUN" },
  { key: "expulsion", doc_count: 1, includePOS: "PROPN" },
]

But because my search query is so nested the body is reponse is also very nested. This is my query:

{
  query: {
    query_string: {
      query: "date:[now/d TO now]",
    },
  },
  aggs: {
    title: {
      nested: {
        path: "title",
      },
      aggs: {
        title: {
          filter: {
            terms: {
              "title.POS.keyword": ["NOUN", "PROPN"],
            },
          },
          aggs: {
            title: {
              terms: {
                field: "title.token.keyword",
                size: 10,
              },
              aggs: {
                includesome: {
                  terms: {
                    field: "title.POS.keyword",
                    size: 1,
                  },
                },
              },
            },
          },
        },
      },
    },
  },
}

Is there any way of writing my query more simple?

One reason might be using nested fields. Though it may have side effects, if you use flat data structure for tokens the query will be much more simple:

{ book_id: 1, token_seq:0, token: "the", POS: "DET"}
{ book_id: 1, token_seq:1, token: "life", POS: "PROPN" }...

I need to use nested fields, otherwise the connection between token and POS gets lost.

I meant the structure that one document for one token though it is a radical change.

Thats not really possible, there are alot of other fields. One Array refers to data in one doc.

I have no idea, sorry.
I'm not sure whether you are already using, one thing I can tell is that it is possible to filter the output by filter_path parameter.
GET /your_index/_search?filter_path=aggregations.title

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.