Querying limited nested data (However nested type is deprecated in 7.x)

Elasticsearch 7.7 and I'm using the official php client to interact with the server.

My issue was somewhat solved here: Need to return part of a doc from a search query/filter; is parent-child the way to go? - #2 by xavierfacq

However "nested" is deprecated in version 7 because types are deprecated: Removal of mapping types | Elasticsearch Guide [7.16] | Elastic

Here is my document:

{
  "offering_id": "1190",
  "account_id": "362353",
  "service_id": "20087",
  "title": "Quick Brown Mammal",
  "slug": "Quick Brown Fox",
  "summary": "Quick Brown Fox"
  "header_thumb_path": "uploads/test/test.png",
  "duration": "30",
  "alter_ids": [
    "59151",
    "58796",
    "58613",
    "54286",
    "51812",
    "50052",
    "48387",
    "37927",
    "36685",
    "36554",
    "28807",
    "23154",
    "22356",
    "21480",
    "220",
    "1201",
    "1192"
  ],
  "premium": "f",
  "featured": "f",
  "events": [
    {
      "event_id": "9999",
      "start_date": "2020-07-01 14:00:00",
      "registration_count": "22",
      "description": "boo"
    },
    {
      "event_id": "9999",
      "start_date": "2020-07-01 14:00:00",
      "registration_count": "22",
      "description": "xyz"
    },
    {
      "event_id": "9999",
      "start_date": "2020-08-11 11:30:00",
      "registration_count": "41",
      "description": "test"
    }
  ]
}

Notice how the object may have one or many "events"

Searching based on event data is the most common use case.

For example:

  • Find events that start before 12pm
  • Find events with a description of "xyz"
  • List find events with a start date in the next 10 days.

I would like to NOT return any events that didn't match the query!

So, for example Find events with a description of "xyz" I would want the result to look like this:

{
  "offering_id": "1190",
  "account_id": "362353",
  "service_id": "20087",
  "title": "Quick Brown Mammal",
  "slug": "Quick Brown Fox",
  "summary": "Quick Brown Fox"
  "header_thumb_path": "uploads/test/test.png",
  "duration": "30",
  "alter_ids": [
    "59151",
    "58796",
    "58613",
    "54286",
    "51812",
    "50052",
    "48387",
    "37927",
    "36685",
    "36554",
    "28807",
    "23154",
    "22356",
    "21480",
    "220",
    "1201",
    "1192"
  ],
  "premium": "f",
  "featured": "f",
  "events": [
    {
      "event_id": "9999",
      "start_date": "2020-07-01 14:00:00",
      "registration_count": "22",
      "description": "xyz"
    }
  ]
}

However, instead it just returns the ENTIRE document, with all events.

Is it even possible to return only a subset of the data? Maybe with Aggregations?

  • Right now, we're doing an "extra" set of filtering on the result set in the application (php in this case) to strip out event blocks that don't match the desired results.
  • It would be nice to just have elastic give directly what's needed instead of doing extra processing on the result to pull out the applicable event.
  • Thought about restructuring the data to instead have it based around "events" but then I would be duplicating data since every offering will have the parent data too.

This used to be in SQL, where there was a relation instead of having the data nested like this.

In Elasticsearch you often need to structure your data differently when moving from a relational database, e.g. by denormalizing. As you are searching for events it would make sense to store these as documents and duplicate the offering information. This will likely give faster and simpler queries while potentially taking up a bit more space. Elasticsearch is pretty good at compressing data so it may not take up as much extra space as you think. You may need to do more work at update time, but if updates are infrequent this is often a price worth paying.

Thank you. This will work fine for this use case, and it actually makes more sense to base the indexed data around these events. I've actually already updated the indexing jobs to construct the objects like this.

However future plans with elasticsearch will ideally use nested objects, otherwise the amount of data duplication will really force more costly infrastructure which would be nice to avoid by nesting the data at the cost of any performance hit using nested objects.

It seems nested types are truly deprecated, (even the api docs specifically say type is DEPRECATED in the putMapping() method)... so should I store the data I would normally "nest" into a different index? I realize that's "relational" in design but I'm trying to find what the "best practice" is when denormalizing that far may not be the most ideal or elegant solution.

Document types are deprecated, not mapping types. Mappings have traditionally specified the document types and this format is now changing as a result of documemnt types going away. Nested documents are still supported.

When using nested documents it is important to understand the tradeoffs you are making (query overhead/limitations, updates gets more expensive the more nested documents you have under a document) and not simply try to replicate relational structures, as that can often be inefficient.

I think it'll help give some relevance on what I'm actually trying to build here.

This whole thing spurred up when I was going to trigger putMapping() to set a nested type and I saw this:

So I started reading the docs on the subject -- which eventually led me here.

Document types are deprecated, not mapping types

I promise I'm not trying to be dumb here, and I've read through your response several times to ensure I'm not misunderstanding, but I think I'm still misunderstanding. I have no idea the difference. What is the non-deprecated way to set a type in the mapping?

this format is now changing

Again, sorry, but what is the new format?

Thanks so much for your help, this all made perfect sense until I hit that deprecation message.

Compare these two ways of putting a mapping in version 6.8 and the latest 7.8:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-put-mapping.html#updating-field-mappings (6.8)

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html#add-multi-fields-existing-field-ex (7.8)

Notice that the document type _doc is no longer specified. This is describe further in the last section of the 6.8 docs.

Thanks again!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.