Filter with millions of record

(Puroo Jain) #21

Charts are working on the basis of Ids..

Main problem is if we fetch detials from 4 to 5 lakh data without any terms filter, then query take too much time..
if also use terms filters for ids then its also take time.
Suggest some :neutral_face:

(Christian Dahlqvist) #22

Please do not post images of text. Instead copy it and make sure it is properly formatted using the tools in the UI. What do the other document types look like (locations and assets)?

(Puroo Jain) #23

Location and Assets are the type of cooler-iot-db index which are share above.
Other index are also same format.

(Christian Dahlqvist) #24

If you have all the document ids and are retrieving the related documents by this, why are you not using the multiget API instead if a terms query?

(Deepak Chaudhary) #25


(David Pilato) #26

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:


This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
Please update your post.

(Deepak Chaudhary) #27

David, I apologize but now corrected.

Christian, multi-get API I think to use for get the ids, but we have calculation also..after which we find actual ids.

EX: In above share screenshot, if click on 8-12 hours, then we also need some extra field like "Duration" by which we calculate 8-12 hours.

(David Pilato) #28

You can index one single document SmartDeviceHealthRecord like:

      "Street":"31 CESTA",
      "Street2":" ",
      "Street3":"MERCATOR ",
      "LocationType":"CONVENIENCE STORE",
      "Classification":"KA/FC Silver 02",
      "Address":"31 CE",
      "SalesTerritory":"FC MIX",
      "SalesGroup":"FC MIX East",
      "SalesOffice":"FC MIX",
      "SalesOrganization":"CCHBC Slovenija",
      "SubTradeChannelType":"Chain ",
      "SalesHierarchy":"FC MIX",
   "SalesHierarchy":"Sofia 2E1",

(Deepak Chaudhary) #29

David, We have no unique Asset and Location in Health Index,
And we have too many records in index..we have create health index for month wise.

if one record has A Asset and L location in Asset and location index, then we have 10K records for A asset and L location in Health Index.
if we create single index and somethng comes to update in location then its too diffucult to update the the record in health index.
So, we create separate index for Asset and Location, so if any update comes then easily update the record.

(David Pilato) #30

its too diffucult to update the the record in health index.

May be. May be not. That is worth exploring IMO.

Otherwise you can have a look at the parent child feature but I'm not a big fan of it TBH.

(Christian Dahlqvist) #31

When you are working with Elasticsearch you often need to denormalise in order to support flexible and efficient aggregations and searching. When doing so you need to look at how you split work between indexing and querying. If you can do more work at indexing time you can benefit from this at query time.

If you were to denormalise as David suggested, updates would be more complicated and require more documents to be modified, but if these are not done very frequently it may be worth it in order to get much more efficient queries.

(Deepak Chaudhary) #32

These update of Asset and Location have frequently as that is Master Data and we got regular updates from the Client so it will will difficult to update the Health records because it can be in millions or more in future.

Because Health and some more type of data is telematry data of our IOT devices and right now we have arround 60 K device functional and in near future on 2-3 Months 300 K device will be functional and 1 device create approx 200 recrods per day.

So we need a solution which will fit this scenario in a very efficient manner.

(Christian Dahlqvist) #33

I can unfortunately not think of any other ways to make this more efficient at the moment.

(David Pilato) #34


You have basically 3 choices:

  • You absolutely want to have a relational model: use a relational database but you will most likely give up on speed
  • You want to use elasticsearch with parent-child model: use parent-child feature but you will give up a bit on speed and on memory usage and it will require more complex queries
  • You want to have the fastest response time as possible: denormalize your data but you are giving up on indexation speed and it will require more efforts on your side (for reindex for example)

Always a question of choice IMO. You can't have everything IMO.

(Deepak Chaudhary) #35

Thx David , It's looking 3rd one is the best approach to go with that.

(Deepak Chaudhary) #36

Hi All,
As we discussed, now we have change DB Structure and save all record in just one index and store all data in day wise Assets.
Also upgrade the elastic in 6.2.3
Now, as we store data day wise and in one query I create 10 aggregation and find sum of Assets in daywise.
And we have approx 30K data in one agg so, 30k into 10 aggregation means 300K data.
So problem is query take 25 sec for response.
Please suggest to which we improve the query.
I Attach one aggregation code.

 "CurrentHoursCorrectTemperature": {
  "filter": {
    "bool": {
      "must": [
          "range": {
            "EventDate": {
              "gte": "2018-03-01T00:00:00",
              "lte": "2018-03-31T23:59:59"
      "must_not": [
          "term": {
            "SumOfHealthIntervalBetweenTemperature1To12": 0
  "aggs": {
    "AssetCount": {
      "cardinality": {
        "field": "AssetId"
    "Asset": {
      "terms": {
        "field": "AssetId",
        "size": 100000
      "aggs": {
        "HoursCorrectTemperature": {
          "sum": {
            "field": "SumOfHealthIntervalBetweenTemperature1To12"

(Puroo Jain) #37


(David Pilato) #38

This looks a bad idea to me:

"size": 100000

Not sure if this is the problem here but I'd start by removing that.

(Puroo Jain) #39

If we remove size, then its take default value 10 and provides only 10 result in bucket,
But we have more data.

(Puroo Jain) #40

I need more help. In below query if I need only count in the script, then what will I change?

 "PowerData": {
  "aggs": {
    "AssetBucket": {
      "terms": {
        "field": "AssetId",
        "size": 10000
      "aggs": {
        "PowerOffDuration": {
          "sum": {
            "field": "SumOfPowerOffDuration"
        "conversion_ratio": {
          "bucket_script": {
            "buckets_path": {
              "total_clicks": "PowerOffDuration"
            "script": {
              "source": "return [params.total_clicks].sum() /3100/31 >=1  && [params.total_clicks].sum() /3100/31 < 12 ? 1 : 0"