After realizing that the search api in ElasticSearch will only handle up to 10,000 results, my next option to retrieve all the logs (more than 10,000 results) was through creating a point in time id and passing that id to the search API with the search_after
parameter.
So I did the following in order to retrieve the PIT (point in time) id:
POST /customer-simulation-es-app-logs*/_pit?keep_alive=5m
after doing so, I did used that PIT id in the search api:
GET /_search
{
"size": 5,
"_source": ["@timestamp", "messageTemplate", "message"],
"query": {
"range": {
"@timestamp": {
"gte": "2021-06-07T00:00:00Z",
"lte": "2021-06-08T00:00:00Z"
}
}
},
"pit":{
"id": "85ezAwIzY3VzdG9tZXItc2ltdWxhdGlvbi1lcy1hcHAtbG9ncy1kZXZlbG9wbWVudC0yMDIxLTA3FmRBanRhWDJYVFFlZjJKSmFhYW95Z3cAFmhaQ3NlRDVKUjBHdjI0dGhrblJpTWcAAAAAAAAACMYWdl9IU19sUFFRLVdlSUM3bVJKWFR2UQAzY3VzdG9tZXItc2ltdWxhdGlvbi1lcy1hcHAtbG9ncy1kZXZlbG9wbWVudC0yMDIxLTA2FlZTUGZOT1o2VGptal8zVEZwTWtZb0EAFmhaQ3NlRDVKUjBHdjI0dGhrblJpTWcAAAAAAAAACMUWdl9IU19sUFFRLVdlSUM3bVJKWFR2UQACFlZTUGZOT1o2VGptal8zVEZwTWtZb0EAABZkQWp0YVgyWFRRZWYySkphYWFveWd3AAA=",
"keep_alive": "5m"
},
"sort":[
{"@timestamp": {"order": "asc"}}
]
}
I then re-used the PIT id and added a search_after
parameter and used the last sort value from the hit inside of the search after and set track_total_hits = false
so that way it could disable the tracking of the total hits to speed up pagination.
GET /_search
{
"size": 5,
"_source": ["@timestamp", "messageTemplate", "message"],
"query": {
"range": {
"@timestamp": {
"gte": "2021-06-07T00:00:00Z",
"lte": "2021-06-08T00:00:00Z"
}
}
},
"pit":{
"id": "85ezAwIzY3VzdG9tZXItc2ltdWxhdGlvbi1lcy1hcHAtbG9ncy1kZXZlbG9wbWVudC0yMDIxLTA3FmRBanRhWDJYVFFlZjJKSmFhYW95Z3cAFmhaQ3NlRDVKUjBHdjI0dGhrblJpTWcAAAAAAAAACMYWdl9IU19sUFFRLVdlSUM3bVJKWFR2UQAzY3VzdG9tZXItc2ltdWxhdGlvbi1lcy1hcHAtbG9ncy1kZXZlbG9wbWVudC0yMDIxLTA2FlZTUGZOT1o2VGptal8zVEZwTWtZb0EAFmhaQ3NlRDVKUjBHdjI0dGhrblJpTWcAAAAAAAAACMUWdl9IU19sUFFRLVdlSUM3bVJKWFR2UQACFlZTUGZOT1o2VGptal8zVEZwTWtZb0EAABZkQWp0YVgyWFRRZWYySkphYWFveWd3AAA=",
"keep_alive": "5m"
},
"sort":[
{"@timestamp": {"order": "asc"}}
],
"search_after":[
"1623081497686" ,
1853
],
"track_total_hits": false
}
which gave me my next last hit value I would need to insert again into the search_after
parameter:
My question is, is there a way to extract a PIT ID using NEST and being able to extract the previous hit value in NEST so that way the system will know which value to search after instead of having to manually copy and paste that value into the search_after
parameter, while keeping that same PIT id alive?
This is what I currently have for my NEST:
var response = await _elasticClient.SearchAsync<EsSource>(s => s
.Size(3000) // must see about this
.Source(src => src.Includes(i => i
.Fields(f => f.timestamp,
fields => fields.messageTemplate,
fields => fields.message)))
.Index("customer-simulation-es-app-logs*")
.Query(q => +q
.DateRange(dr => dr
.Field("@timestamp")
.GreaterThanOrEquals("2021-06-12T16:39:32.727-05:00")
.LessThanOrEquals(DateTime.Now))));