About writing elasticsearch subquery DSL

I having a template mappings like this:

{
  "template": "ones_bug-*",
  "settings": {
    "index": {
      "number_of_shards": "5",
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "bug_info": {
      "_all": {
        "enabled": false
      },
      "properties": {
        "createdAt": {
          "type": "date"
        },
        "startAt": {
          "type": "date"
        },
        "updatedAt": {
          "type": "date"
        },
        "closeAt": {
          "type": "date"
        },
        "expectClose": {
          "type": "date"
        },
        "crashMsg": {
          "type": "text"
        },
        "expectTime": {
          "type": "float"
        },
        "actualWorkTime": {
          "type": "float"
        },
        "fixDuration": {
          "type": "float"
        },
        "closeDuration": {
          "type": "float"
        },
        "bugReason": {
          "type": "keyword"
        },
        "onesId": {
          "type": "keyword"
        },
        "state": {
          "type": "keyword"
        },
        "iterationId": {
          "type": "keyword"
        },
        "moduleId": {
          "type": "keyword"
        },
        "severityId": {
          "type": "keyword"
        },
        "severityName": {
          "type": "keyword"
        },
        "createdUser": {
          "type": "keyword"
        },
        "updatedUser": {
          "type": "keyword"
        },
        "verifiedUser": {
          "type": "keyword"
        },
        "assignedUser": {
          "type": "keyword"
        },
        "subtypeId": {
          "type": "keyword"
        },
        "priority": {
          "type": "keyword"
        },
        "priorityId": {
          "type": "keyword"
        },
        "priorityName": {
          "type": "keyword"
        },
        "labels": {
          "type": "keyword"
        },
        "title": {
          "type": "keyword"
        },
        "projectId": {
          "type": "keyword"
        },
        "md5": {
          "type": "keyword"
        },
        "onesUrl": {
          "type": "text"
        }
      }
    }
  },
  "aliases": {}
}

Now, I want to query some docs by DSL which like the sql:

select count(state) from (
	select max(updateAt), onesId from master group by onesId
) group by state

How could I write this DSL?

Elasticsearch does not support joins or subqueries, so there is no direct translation. If you describe what you are looking to achieve in greater detail someone might be able to help with how to best model it.

hi Christian_Dahlqvist,
Thanks for your reply.
The job I want to achieve is to save the bug record in ES by every round of automated testing. Every bug record is saved as doc includes bug id field, bug state field and other fields just like the template mapping shows above. And there will be multiple docs represent one bug record that has same bug id, but different bug state.
Now, I want to get the number of bug record group by every bug state, eg: how much bugs which state is 'opened', or how much bugs which state is 'closed'. As there are multiple docs for the same bug id but different bug state, I used 'updateAt' field which value is latest time to avoid repeatedly statistics.
So as my train of thought, firstly, I want to get every latest 'updateAt' bug record, the sql like:

select max(updateAt), onesId from master group by onesId

Then, based on first query result to make statistics group by state:

select count(state) from (
	select max(updateAt), onesId from master group by onesId
) group by state

So, how could I design my index structure?

When working with a data store that does not support joins and subqueries it is quite common to move some of the work to index time and set the data up for simpler and more efficient querying. The index you currently have shows the history and can be used to efficiently support some types of queries. If you however have a query that should only consider the latest state of some entity, it often helps to create a separate entity-centric index. This index would have a document for each entity and the document ID would be set to something that uniquely identifies the entity. Whenever you insert a change to the current index you also update or override the corresponding document in the new entity-centric index. This new index therefore conatins a single document per entity that contains the latest state and can be used for analysis without requiring your subquery. There is more work at index time but the benefit is that you have two indices that can efficiently serve different types of queries.

I update stock doc finally, thanks for your help~

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.