Showing min and max results only for relevant type of data

Consider that I have objects such as: house, person, toy, mouse, etc..

And that there are following custom data types:

  • int
  • double
  • text

Different objects' properties hold different data types:

  • price is a double
  • length is a double
  • email is a text
  • number of floors is an int

What is the most appropriate way of saving this data into ElasticSearch, and then retrieving statistical data about object properties? See more detailed info next.

I came up with 2 possible design options:

design option 1

object1 = [
    "type" => "house",
    "properties" => [
        [
            "type" => "length",
            "value" => 10.12,
        ],[
            "type" => "floor_cnt",
            "value" => 5,
        ]
    ]
]

object2 = [
    "type" => "person",
    "properties" => [
        [
            "type" => "length",
            "value" => 168.12,
        ],[
            "type" => "email",
            "value" => "foo@foo.bar",
        ]
    ]
]

design option 2

object1 = [
    "type" => "house",
    "properties" => [
        "length" => 10.12,
        "floor_cnt" => 5
    ]
]

object2 = [
    "type" => "person",
    "properties" => [
        "length" => 168.12,
        "email" => "foo@foo.bar"
    ]
]

Index mappings

[
    'index' => 'my_index',
    'body' => [
        'mappings' => [
            'my_data' => [
                '_source' => [
                    'enabled' => true
                ],
                'properties' => [
                    "properties" => [ // name of column
                        "type" => "nested"
                    ]
                ]
            ]
        ]
    ]
]

Note: In case of design 2 - we can get rid of nesting.

I can not decide whether I should use first design pattern or the second, or possibly some other option. Here is what I am trying to accomplish. The expected result that I want to see from aggregation query is:

doc_count = 2
properties
    length
        count = 2
        min = 10.12
        max = 168.12
    floor_cnt
        count = 1
        min = 5
        max = 5
    email
        count = 1

I.e. min and max values are only calculated for int and double data types (or alternatively for properties length and floor_cnt). And then there is a total number of occurrences of each property.

Currently I don't see an easy way of retrieving such data using ElasticSearch aggregation. Any help and advices are appreciated.

I am using ES 5.x

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.