Filter with millions of record

Christian_Dahlqvist · June 13, 2018, 9:57am

Please do not post images of text. Instead copy it and make sure it is properly formatted using the tools in the UI. What do the other document types look like (locations and assets)?

Puroo_Jain · June 13, 2018, 10:06am

Location and Assets are the type of cooler-iot-db index which are share above.
Other index are also same format.

Christian_Dahlqvist · June 13, 2018, 10:33am

If you have all the document ids and are retrieving the related documents by this, why are you not using the multiget API instead if a terms query?

chaudharys31 · June 13, 2018, 10:35am

abcd

dadoonet · June 13, 2018, 10:52am

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
Please update your post.

chaudharys31 · June 13, 2018, 10:58am

David, I apologize but now corrected.

Christian, multi-get API I think to use for get the ids, but we have calculation also..after which we find actual ids.

EX: In above share screenshot, if click on 8-12 hours, then we also need some extra field like "Duration" by which we calculate 8-12 hours.

dadoonet · June 13, 2018, 12:21pm

You can index one single document SmartDeviceHealthRecord like:

{
   "LightIntensity":104,
   "Temperature":11.5,
   "Humidity":0,
   "SoundLevel":0,
   "IsDoorOpen":0,
   "EventTypeId":0,
   "PowerStatusId":0,
   "BatteryLevel":65,
   "SerialNumberPrefix":"SBC",
   "IsTemperatureIssue":false,
   "IsLightIssue":false,
   "AlertProcessorId":1498760266,
   "DeviceLightStatus":"FullLightBrightness",
   "HealthInterval":10,
   "Id":"1CCAE32009B72017060213300028148",
   "SmartDeviceId":92,
   "GatewayId":35483,
   "DeviceSerial":92487,
   "GatewayMac":"463",
   "GatewaySerialNumber":"4552",
   "EventId":28148,
   "EventTime":"2017-06-02T13:30:00",
   "CreatedOn":"2017-06-29T10:28:38.693",
   "Asset":{
      "Id":"1998356",
      "TagNumber":"",
      "SerialNumber":"RU7151982281",
      "LocationId":0,
      "AssetTypeId":0,
      "IsActive":false,
      "LocationGeo":{
         "lat":0,
         "lon":0
      },
      "LatestStockId":0,
      "LatestPurityId":0,
      "LatestProcessedPurityId":0,
      "LatestDoorStatusId":0,
      "LatestHealthRecordId":0,
      "CreatedByUserId":27232,
      "CreatedOn":"2018-05-22T11:20:25.15",
      "ModifiedByUserId":27232,
      "ModifiedOn":"2018-05-22T11:20:25.15",
      "IsDeleted":false,
      "ParentAssetId":0,
      "Installation":"2018-05-22T11:20:25.15",
      ..
   },
   "ClientId":1,
   "CountryId":26,
   "StateId":0,
   "Location":{
      "Id":"3513720",
      "CustomerId":0,
      "AreaId":0,
      "LocationTypeId":41,
      "Name":"MERCATOR, D.D. MARKET ZAGORJE",
      "CountryId":136,
      "StateId":0,
      "City":"ZAGORJE",
      "Street":"31 CESTA",
      "Street2":" ",
      "Street3":"MERCATOR ",
      "PostalCode":"1410",
      "IsKeyLocation":true,
      "ClassificationId":214,
      "DirectionNotes":"",
      "ClientId":1,
      "IsDeleted":false,
      "CreatedByUserId":5829,
      "CreatedOn":"2017-11-01T13:58:24.197",
      "ModifiedByUserId":5829,
      "ModifiedOn":"2018-04-09T10:14:37.417",
      "LocationCode":"2100032350",
      "PrimaryContactId":0,
      "Identity1":"",
      "Identity2":"",
      "UniqueKey":0,
      "TimeZoneId":39,
      "LocationText":"",
      "OrderStatus":"",
      "MarketId":0,
      "PrimaryRepId":0,
      "PrimaryPhone":"035660140",
      "PriceTypeId":0,
      "PriceForTwo":0,
      "OwnerId":0,
      "LocationSalesRep":"",
      "LocationGeo":{
         "lat":46.1321,
         "lon":14.9957
      },
      "NearestLocationGeo":{
         "lat":0,
         "lon":0
      },
      "IsSmart":true,
      "PrimarySalesRep":"",
      "State":"",
      "Country":"Slovenia",
      "MarketName":"",
      "LocationType":"CONVENIENCE STORE",
      "IsSurveyed":false,
      "Classification":"KA/FC Silver 02",
      "Address":"31 CE",
      "SubTradeChannelTypeId":71,
      "TerritoryId":11576,
      "SalesGroupId":3228,
      "SalesOfficeId":3214,
      "SalesOrganizationId":3213,
      "SalesTarget":0,
      "LocationId":3513720,
      "SalesTerritory":"FC MIX",
      "SalesGroup":"FC MIX East",
      "SalesOffice":"FC MIX",
      "SalesOrganization":"CCHBC Slovenija",
      "SubTradeChannelType":"Chain ",
      "SalesHierarchyId":11576,
      "SalesHierarchy":"FC MIX",
      "SalesTerritoryCode":"21EB558",
      "CurrentlyAsssignedBDUsername":"",
      "CurrentlyAssignedBDName":"",
      "OutletType":"Market",
      "OutletTypeId":6282,
      "TeleSellingTerritoryId":13814,
      "TeleSellingTerritoryCode":"21CX30",
      "TeleSellingTerritoryName":"KR-TS"
   },
   "City":"СОФИЯ",
   "TimeZoneId":51,
   "AssetTypeId":171,
   "LocationTypeId":41,
   "SubTradeChannelTypeId":68,
   "ClassificationId":41,
   "LocationCode":"1304018503",
   "IsKeyLocation":true,
   "PrimaryRepId":0,
   "AssetManufacturerId":25,
   "TerritoryId":289,
   "SalesGroupId":67,
   "SalesOfficeId":25,
   "SalesOrganizationId":5,
   "SmartDeviceManufacturerId":3,
   "AssetSerialNumber":"640439",
   "AssetCapacity":0,
   "SalesHierarchyId":396,
   "SalesHierarchy":"Sofia 2E1",
   "OutletType":"Market",
   "OutletTypeId":6282
}

chaudharys31 · June 13, 2018, 12:38pm

David, We have no unique Asset and Location in Health Index,
And we have too many records in index..we have create health index for month wise.

if one record has A Asset and L location in Asset and location index, then we have 10K records for A asset and L location in Health Index.
if we create single index and somethng comes to update in location then its too diffucult to update the the record in health index.
So, we create separate index for Asset and Location, so if any update comes then easily update the record.

dadoonet · June 13, 2018, 12:52pm

its too diffucult to update the the record in health index.

May be. May be not. That is worth exploring IMO.

Otherwise you can have a look at the parent child feature but I'm not a big fan of it TBH.

Christian_Dahlqvist · June 13, 2018, 12:57pm

When you are working with Elasticsearch you often need to denormalise in order to support flexible and efficient aggregations and searching. When doing so you need to look at how you split work between indexing and querying. If you can do more work at indexing time you can benefit from this at query time.

If you were to denormalise as David suggested, updates would be more complicated and require more documents to be modified, but if these are not done very frequently it may be worth it in order to get much more efficient queries.

chaudharys31 · June 14, 2018, 6:22am

These update of Asset and Location have frequently as that is Master Data and we got regular updates from the Client so it will will difficult to update the Health records because it can be in millions or more in future.

Because Health and some more type of data is telematry data of our IOT devices and right now we have arround 60 K device functional and in near future on 2-3 Months 300 K device will be functional and 1 device create approx 200 recrods per day.

So we need a solution which will fit this scenario in a very efficient manner.

Christian_Dahlqvist · June 14, 2018, 6:48am

I can unfortunately not think of any other ways to make this more efficient at the moment.

dadoonet · June 14, 2018, 7:03am

Agreed.

You have basically 3 choices:

You absolutely want to have a relational model: use a relational database but you will most likely give up on speed
You want to use elasticsearch with parent-child model: use parent-child feature but you will give up a bit on speed and on memory usage and it will require more complex queries
You want to have the fastest response time as possible: denormalize your data but you are giving up on indexation speed and it will require more efforts on your side (for reindex for example)

Always a question of choice IMO. You can't have everything IMO.

chaudharys31 · June 14, 2018, 8:47am

Thx David , It's looking 3rd one is the best approach to go with that.

chaudharys31 · June 28, 2018, 6:40am

Hi All,
As we discussed, now we have change DB Structure and save all record in just one index and store all data in day wise Assets.
Also upgrade the elastic in 6.2.3
Now, as we store data day wise and in one query I create 10 aggregation and find sum of Assets in daywise.
And we have approx 30K data in one agg so, 30k into 10 aggregation means 300K data.
So problem is query take 25 sec for response.
Please suggest to which we improve the query.
I Attach one aggregation code.

 "CurrentHoursCorrectTemperature": {
  "filter": {
    "bool": {
      "must": [
        {
          "range": {
            "EventDate": {
              "gte": "2018-03-01T00:00:00",
              "lte": "2018-03-31T23:59:59"
            }
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "SumOfHealthIntervalBetweenTemperature1To12": 0
          }
        }
      ]
    }
  },
  "aggs": {
    "AssetCount": {
      "cardinality": {
        "field": "AssetId"
      }
    },
    "Asset": {
      "terms": {
        "field": "AssetId",
        "size": 100000
      },
      "aggs": {
        "HoursCorrectTemperature": {
          "sum": {
            "field": "SumOfHealthIntervalBetweenTemperature1To12"
          }
        }
      }
    }
  }
}

Puroo_Jain · June 29, 2018, 11:41am

Anyone?

dadoonet · June 29, 2018, 9:52pm

This looks a bad idea to me:

"size": 100000

Not sure if this is the problem here but I'd start by removing that.

Puroo_Jain · July 2, 2018, 9:20am

If we remove size, then its take default value 10 and provides only 10 result in bucket,
But we have more data.

Puroo_Jain · July 2, 2018, 9:23am

I need more help. In below query if I need only count in the script, then what will I change?

 "PowerData": {
  "aggs": {
    "AssetBucket": {
      "terms": {
        "field": "AssetId",
        "size": 10000
      },
      "aggs": {
        "PowerOffDuration": {
          "sum": {
            "field": "SumOfPowerOffDuration"
          }
        },
        "conversion_ratio": {
          "bucket_script": {
            "buckets_path": {
              "total_clicks": "PowerOffDuration"
            },
            "script": {
              "source": "return [params.total_clicks].sum() /3100/31 >=1  && [params.total_clicks].sum() /3100/31 < 12 ? 1 : 0"
            }
          }
        }
      }
    }
  }
}

dadoonet · July 2, 2018, 10:36am

I know that. I just want to know if this makes things faster or not.

Topic		Replies	Views
Filter on thousands of IDS - How is it efficient? Elasticsearch	2	618	July 6, 2017
Very Large Terms Query Elasticsearch	5	5978	July 5, 2017
Efficient query with a big input filter set? Elasticsearch	4	1239	July 6, 2017
Speed of query with many filters Elasticsearch	6	371	July 6, 2017
Why returning many records with no fields (only IDs) is CPU intensive? Elasticsearch	3	321	June 14, 2018

Filter with millions of record

Related topics