A query to create flat table from different data


#1

Hi,

I'm trying to come up with a query that will help me take my data and aggregate it into one table.

I have many records such as the following:

{
"_index": "doc",
"_type": "info",
"_id": "4",
"_routing": "123",
"_parent": "123",
"_source": {
			"data:{
                      "name": "kuku",
			}
			"id": "123"
		    }
}

{
"_index": "doc",
"_type": "info",
"_id": "6",
"_routing": "678",
"_parent": "678",
"_source": {
			"data:{
                     "name": "kuku",
			}
			"id": "678"
		   }
}

{
"_index": "doc",
"_type": "info",
"_id": "8",
"_routing": "213",
"_parent": "213",
"_source": {
			"data:{
                     "name": "kuku2",
			}
			"id": "213"
		     }
}

and i want to go over all the info types and create the following:

Kuku: [123, 678]
Kuku2: [213]

I wanted to ask if its possible to create this with one query?
A query that will go over all the info types and each time take the name field and add the id field to a list by its name field

Thank you!


(Abdon Pijpelink) #2

While it doesn't give you a flat table, a terms aggregation nested in another terms aggregation would give you what you want in a single request. Assuming the name and id fields are mapped as type keyword, or have fielddata enabled, the following request:

GET doc/info/_search
{
  "size": 0, 
  "aggs":{
    "names": {
      "terms": {
        "field": "data.name",
        "size": 100
      },
      "aggs": {
        "ids": {
          "terms": {
            "field": "id",
            "size": 100
          }
        }
      }
    }
  }
}

Returns:

  "aggregations": {
    "names": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "kuku",
          "doc_count": 2,
          "ids": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "123",
                "doc_count": 1
              },
              {
                "key": "678",
                "doc_count": 1
              }
            ]
          }
        },
        {
          "key": "kuku2",
          "doc_count": 1,
          "ids": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "213",
                "doc_count": 1
              }
            ]
          }
        }
      ]
    }
  }

Some things to be aware of here: because of the "size": 100 you will only get the 100 most common names and IDs. You can change the size parameter if you need more or less values. Also, The values are ranked by popularity (how common they are - based on the doc_count). If you want, you can order the results differently.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.