Hello @Patrick_Whelan,
Thanks for your help on that subject
I could be wrong, but I would think the timestamps should be closer, at least within the frequency range.
I'm not sure I get your point, but isn't it due to optimizations from align_checkpoints
? (defaulting to true
according to the Create transform API)
If I understood correctly, it avoids intermediate runs (and thus updates) between each search able to build a full bucket of date_histogram.fixed_interval
. At least, that would explain why the checkpoint number only increased by 2 in the last stats, as only 2 full-bucket runs took place...
My bet is that last_search_time
will increase at the pace of frequency
while timestamp_millis
gets aligned to the start of the day because of date_histogram.fixed_interval = 1d
and align_checkpoints = true
.
That's not a proof, but all my 1day jobs have a timestamp_millis
between 00:00:00 UTC and 01:00:00 UTC (even before 00:20:00 UTC...) and checkpoint = 52
(all jobs were created the same day).
Now that another day has gone by, can you check the stats API again to see what the output says?
{
"count": 2,
"transforms": [
{
"id": "report-geoplateforme.aggregate-datastore_offering_1day",
"state": "started",
"node": {
"id": "cQLBAs4ZTt2ec_193MBt8g",
"name": "elasticsearch-es-transform-0",
"ephemeral_id": "OsG4t35qSqa7DENOCAO6VQ",
"transport_address": "10.2.9.116:9300",
"attributes": {}
},
"stats": {
"pages_processed": 76343,
"documents_processed": 14976442333,
"documents_indexed": 36196047,
"documents_deleted": 0,
"trigger_count": 1213,
"index_time_in_ms": 3168780,
"index_total": 73179,
"index_failures": 0,
"search_time_in_ms": 120987217,
"search_total": 76343,
"search_failures": 0,
"processing_time_in_ms": 358867,
"processing_total": 76343,
"delete_time_in_ms": 0,
"exponential_avg_checkpoint_duration_ms": 1774628.6741051634,
"exponential_avg_documents_indexed": 266899.46864774165,
"exponential_avg_documents_processed": 289437424.9664737
},
"checkpointing": {
"last": {
"checkpoint": 52,
"timestamp_millis": 1723076170732,
"time_upper_bound_millis": 1723075200000
},
"changes_last_detected_at": 1723076170724,
"last_search_time": 1723126572722
},
"health": {
"status": "green"
}
},
{
"id": "report-geoplateforme.aggregate-datastore_offering_5min",
"state": "started",
"node": {
"id": "cQLBAs4ZTt2ec_193MBt8g",
"name": "elasticsearch-es-transform-0",
"ephemeral_id": "OsG4t35qSqa7DENOCAO6VQ",
"transport_address": "10.2.9.116:9300",
"attributes": {}
},
"stats": {
"pages_processed": 464460,
"documents_processed": 14780000275,
"documents_indexed": 193421268,
"documents_deleted": 2853220,
"trigger_count": 29013,
"index_time_in_ms": 17375903,
"index_total": 398515,
"index_failures": 0,
"search_time_in_ms": 102846428,
"search_total": 464460,
"search_failures": 0,
"processing_time_in_ms": 2186052,
"processing_total": 464460,
"delete_time_in_ms": 657404,
"exponential_avg_checkpoint_duration_ms": 8836.59149519649,
"exponential_avg_documents_indexed": 1962.3107466320855,
"exponential_avg_documents_processed": 2114815.2370356224
},
"checkpointing": {
"last": {
"checkpoint": 14531,
"timestamp_millis": 1723129404686,
"time_upper_bound_millis": 1723129200000
},
"changes_last_detected_at": 1723129404679,
"last_search_time": 1723129554726
},
"health": {
"status": "green"
}
}
]
}
Focusing on the 1day job, we have:
"timestamp_millis": 1723076170732,
"time_upper_bound_millis": 1723075200000
"last_search_time": 1723126572722
Converted to human-readable:
"timestamp_millis": 2024-08-08T00:16:10 UTC
"time_upper_bound_millis": 2024-08-08T00:00:00 UTC
"last_search_time": 2024-08-08T14:16:12 UTC
We should have progressed 1 checkpoint per hour since then, since the frequency is 1 hour, so anything less than that can give us an idea if it isn't keeping up.
We are looking at run 52 and the timestamp_millis
shifted only by a few seconds compared to run 50 posted 2 days ago. I guess it's also linked to the point above about align_checkpoints
(non full-bucket runs were skipped, so checkpoint was not increased that much...)
Are there any errors in the logs, either for the Transform or for the search/index nodes where the requests may be running against the raw index?
I see no errors in Transform messages, only the job creation and successful checkpoints (1 by 1 at first and now only every 10 checkpoints since it reached 10).
That said, you pointed an interesting path about the node logs, as even if it's only WARN, I see a lot of those in the tranform-dedicated pod (most occurences concerning the index with partial data, but also 1 or 2 occurences for each other 1day job):
{
"@timestamp": "2024-08-08T00:24:53.806Z",
"log.level": "WARN",
"message": "[report-geoplateforme.aggregate-datastore_offering_1day] Search context missing, falling back to normal search; request [apply_results]",
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.server",
"process.thread.name": "elasticsearch[elasticsearch-es-transform-0][transport_worker][T#3]",
"log.logger": "org.elasticsearch.xpack.transform.transforms.ClientTransformIndexer",
"elasticsearch.cluster.uuid": "KFRBIrwYSUCVZSbgwhYVWg",
"elasticsearch.node.id": "cQLBAs4ZTt2ec_193MBt8g",
"elasticsearch.node.name": "elasticsearch-es-transform-0",
"elasticsearch.cluster.name": "elasticsearch",
"error.type": "org.elasticsearch.action.search.SearchPhaseExecutionException",
"error.message": "Partial shards failure",
"error.stack_trace": "<FULL_STACK_BELLOW>"
}
Full stack here
Failed to execute phase [query], Partial shards failure; shardFailures {[S6SI9BUhTDmtO00OWkfi7g][.ds-report-geoplateforme.raw-2024.07.25-000206][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-2][10.2.178.20:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4835916]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[7ixH2vqrRYCkbqOebGB94A][.ds-report-geoplateforme.raw-2024.07.25-000210][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-0][10.2.183.164:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4397587]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[7ixH2vqrRYCkbqOebGB94A][.ds-report-geoplateforme.raw-2024.07.26-000214][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-0][10.2.183.164:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4397586]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[7ixH2vqrRYCkbqOebGB94A][.ds-report-geoplateforme.raw-2024.07.26-000216][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-0][10.2.183.164:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4397588]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[S6SI9BUhTDmtO00OWkfi7g][.ds-report-geoplateforme.raw-2024.07.26-000218][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-2][10.2.178.20:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4835917]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[7ixH2vqrRYCkbqOebGB94A][.ds-report-geoplateforme.raw-2024.07.27-000220][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-0][10.2.183.164:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4397589]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[S6SI9BUhTDmtO00OWkfi7g][.ds-report-geoplateforme.raw-2024.07.27-000222][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-2][10.2.178.20:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4835914]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[7ixH2vqrRYCkbqOebGB94A][.ds-report-geoplateforme.raw-2024.07.27-000224][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-0][10.2.183.164:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4397590]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[7ixH2vqrRYCkbqOebGB94A][.ds-report-geoplateforme.raw-2024.07.28-000226][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-0][10.2.183.164:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4397593]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[S6SI9BUhTDmtO00OWkfi7g][.ds-report-geoplateforme.raw-2024.07.28-000230][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-2][10.2.178.20:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4835915]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[S6SI9BUhTDmtO00OWkfi7g][.ds-report-geoplateforme.raw-2024.07.29-000236][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-2][10.2.178.20:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4835918]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[S6SI9BUhTDmtO00OWkfi7g][.ds-report-geoplateforme.raw-2024.07.30-000242][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-2][10.2.178.20:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4835920]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[S6SI9BUhTDmtO00OWkfi7g][.ds-report-geoplateforme.raw-2024.08.02-000262][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-warm-cold-content-2][10.2.178.20:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4835919]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[fH0nAB18SuO-30NtHK3cRA][.ds-report-geoplateforme.raw-2024.08.06-000290][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-hot-2][10.2.178.16:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [1957919]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}{[fH0nAB18SuO-30NtHK3cRA][.ds-report-geoplateforme.raw-2024.08.07-000301][0]: org.elasticsearch.transport.RemoteTransportException: [elasticsearch-es-data-hot-2][10.2.178.16:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [1957921]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
}
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:712)
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:418)
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:744)
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:497)
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:335)
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:62)
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:73)
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:31)
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:53)
at org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:634)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleException(TransportService.java:1751)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1475)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.doHandleException(InboundHandler.java:475)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:462)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:453)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.executeResponseHandler(InboundHandler.java:145)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:122)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:96)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:821)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:124)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:96)
at org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:61)
at org.elasticsearch.transport.netty4@8.13.4/org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:48)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.codec@4.1.94.Final/io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1383)
at io.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1246)
at io.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1295)
at io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)
at io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)
at io.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
at io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [4835916]
at org.elasticsearch.search.SearchService.findReaderContext(SearchService.java:910)
at org.elasticsearch.search.SearchService.createOrGetReaderContext(SearchService.java:924)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:671)
at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:549)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:73)
at org.elasticsearch.action.ActionRunnable$3.accept(ActionRunnable.java:70)
at org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:95)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
From what I read in Continuous transform failures it looks like this is not supposed to be an issue by itself and I don't see any " irrecoverable failure", but I don't have enough log history to confirm there was nothing worse than those WARN logs during the affected days... I will try to fix the log collection process on my cluster and come back with more details if I manage to record a new occurence
As a fallback, is there any way to catch even the WARN events in the transform messages ? I don't see why an INFO log, such as "Finished indexing for transform checkpoint [NN]." is found there, but none of the WARN logs are...
Written in another way => is there a way to make WARN messages end-up in .transform-notifications
index as well ?
My understanding is that the values get updated via a rewrite, kinda like a "purge and replace." For example, the 1 day search will search over the current day's data every 1hr and write the results to the destination index. The next hour it runs, it will redo the search, and any new data will be aggregated with the old data since it is the same day. So it's effectively the same as
data_transfer[bucket] += data_transfer[event]
.
I fear it's no good news for my post-incident recovery scenario => I only have 2 weeks of raw data history, so adding some missing events after this delay could result in a bucket only containing those events, which could be more harmful than valuable
I will maybe focus on this subject with some live tests later and create a separate thread if I'm facing any unexpected/hazardous behaviors.
Sorry for mixing things in this already headache-prone thread, let's focus on finding why some buckets are nearly empty, as I think it's on the good path with the WARN logs above