Hi All
I am looking for some help in understanding how I can debug/find slow performance issues and then resolve them.
We have a dashboard that is presenting some visualisations on an index with 29,369,877 documents and storage size of 51.5gb.
The dashboard works ok when we have timefilter set to 1 day but as soon as we stretch it to 7 por 30 days it slows to a crawl.
The dashboard has about 10 TSVB & metrics visualisations and 7 Document Table visualisations.
Regards
Tommy
Any help here on this query?
It is not the dashboard the is slow, it is the queries that it need to do in your Elasticsearch cluster.
You need to provide more information about your cluster.
What are the specs of your cluster? What is the disk type, SSD or HDD? Does this index have replicas? Are you using runtime fields?
Do your document tables have how many aggregations? What is the kind of metric you are using, count, average, unique?
Can you share a screenshot of your dashboard?
What are the specs of your cluster? It is a 5 node cluster, Indices 1420, JVM Heap 88.3 GB / 155.0 GB , please let me know what else is needed
What is the disk type, SSD or HDD? The cluster is runing on SSD
Does this index have replicas? Yes it has 1 primary and 1 replica,
Are you using runtime fields? Yes I can see 11 runtime fields defined on the index mapping
Each document according to the index pattern has up to 78 fields (inc those 11 runtime ones).
Do your document tables have how many aggregations? 2 tables have count and 5 split rows with 1 split cols and 1 table has count with 1 split row and 1 split col buckets, the rest are not using aggregations.
What is the kind of metric you are using, count, average, unique? count on the tables but sum, unique count, count and max are all used on other visualisations on the dashboard
Can you share a screenshot of your dashboard? Unfortunately not it is internal company data that can not be shared
Can you share your mappings? Are your visualizations using those runtime fields?
Runtime fields impacts search as mentioned in the documention:
Runtime fields use less disk space and provide flexibility in how you access your data, but can impact search performance based on the computation defined in the runtime script.
I think that this may be the issue.
Can you share your mappings? Are your visualizations using those runtime fields?
I have attached the mapping object below and yes the visualisation objects are using these fields
{
"mappings": {
"_doc": {
"dynamic_templates": [],
"runtime": {
"close_patch_count": {
"type": "long",
"script": {
"source": "if(doc['disposition'].value=='Closed' && doc['advisory_id'].size()!=0){\n\n emit(1)\n\n} else{\n\n emit(0)\n\n}",
"lang": "painless"
}
},
"coming_due_day_range": {
"type": "keyword",
"script": {
"source": "if(doc['overdue'].size()!=0 && doc['overdue'].value=='N'){\n\n emit(doc['day_range'].value)\n\n}",
"lang": "painless"
}
},
"coming_dues_count": {
"type": "long",
"script": {
"source": "if(doc['overdue'].size()!=0 && doc['overdue'].value=='N'){\n\n emit(1)\n\n} else{\n\n emit(0)\n\n}",
"lang": "painless"
}
},
"day_range": {
"type": "keyword",
"script": {
"source": "if(doc['due_days'].size()!=0) {\n\n double days_overdue;\ndays_overdue = Math.abs(doc['due_days'].value);\n\nif(days_overdue >= 0 && days_overdue <= 30) {\n emit('0 to 30')\n} else if (days_overdue > 30 && days_overdue <= 60) {\n emit('31 to 60')\n} else if (days_overdue > 60 && days_overdue<=90) {\n emit('61 to 90')\n} else if (days_overdue > 90) {\n emit('91+')\n}}",
"lang": "painless"
}
},
"due_days": {
"type": "long",
"script": {
"source": "if (doc['disposition'].value == 'Open' && !doc['effective_target_date'].empty && doc['effective_target_date'].size() !=0){\n long due_days;\n Instant instant = Instant.ofEpochMilli(new Date().getTime());\n ZonedDateTime now = ZonedDateTime.ofInstant(instant,ZoneId.of('Z'));\n due_days = now.until(doc['effective_target_date'].value, ChronoUnit.DAYS);\n long timestampLog = doc['effective_target_date'].value.getMillis();\n long timestampNow = new Date().getTime();\n if (due_days==0 && timestampNow>timestampLog){\n\temit(due_days-1);\n }else{\n emit(due_days);\n }\n}",
"lang": "painless"
}
},
"no_target_date_count": {
"type": "long",
"script": {
"source": "if(doc['disposition'].value == 'Open' && doc['effective_target_date'].size()==0){\n\n emit(1)\n\n} else{\n\n emit(0)\n\n}",
"lang": "painless"
}
},
"open_patch_count": {
"type": "long",
"script": {
"source": "if(doc['disposition'].value == 'Open' && doc['advisory_id'].size()!=0){\n\n emit(1)\n\n} else{\n\n emit(0)\n\n}",
"lang": "painless"
}
},
"overdue": {
"type": "keyword",
"script": {
"source": "if(doc['disposition'].value == 'Open' && doc['due_days'].size()!=0){\n\n long days_overdue = doc['due_days'].value;\n\nif(days_overdue<0){\n\n emit('Y');\n\n} else{\n\n emit('N');\n\n}}",
"lang": "painless"
}
},
"overdue_day_range": {
"type": "keyword",
"script": {
"source": "if(doc['overdue'].size()!=0 && doc['overdue'].value=='Y'){\n\n emit(doc['day_range'].value)\n\n}",
"lang": "painless"
}
},
"overdue_patch_count": {
"type": "long",
"script": {
"source": "if(doc['overdue'].size()!=0 && doc['overdue'].value=='Y' && doc['advisory_id'].size()!=0){\n\n emit(1)\n\n} else{\n\n emit(0)\n\n}",
"lang": "painless"
}
},
"total_patch_count": {
"type": "long",
"script": {
"source": "if(!doc['advisory_id'].empty && doc['advisory_id'].size()!=0){\n\n emit(1)\n\n} else{\n\n emit(0)\n\n}",
"lang": "painless"
}
}
},
"properties": {
"@timestamp": {
"type": "date"
},
"advisory_id": {
"type": "keyword"
},
"advisory_id_normalized": {
"type": "keyword"
},
"advisory_id_numeric": {
"type": "long"
},
"archived_at": {
"type": "date_nanos"
},
"client_rating": {
"type": "keyword"
},
"closed_at": {
"type": "date_nanos"
},
"closed_by_reference": {
"type": "keyword"
},
"closed_by_type": {
"type": "keyword"
},
"computer_uuid": {
"type": "keyword"
},
"country": {
"type": "keyword"
},
"created_at": {
"type": "date_nanos"
},
"customer_uuid": {
"type": "keyword"
},
"cves": {
"type": "keyword"
},
"deferred_count": {
"type": "integer"
},
"disposition": {
"type": "keyword"
},
"effective_target_date": {
"type": "date_nanos"
},
"geo": {
"type": "keyword"
},
"group_name": {
"type": "keyword"
},
"gsma_code": {
"type": "keyword"
},
"industry": {
"type": "keyword"
},
"ingest_timestamp": {
"type": "date_nanos"
},
"installed_at": {
"type": "date_nanos"
},
"is_mf": {
"type": "boolean"
},
"kyndryl_rating": {
"type": "keyword"
},
"market": {
"type": "keyword"
},
"max_cve": {
"type": "keyword"
},
"max_cvss_rating": {
"type": "keyword"
},
"max_cvss_score": {
"type": "keyword"
},
"offering_code": {
"type": "keyword"
},
"org_market": {
"type": "keyword"
},
"patch_status": {
"type": "keyword"
},
"patch_status_id": {
"type": "keyword"
},
"product": {
"type": "keyword"
},
"product_alias": {
"type": "keyword"
},
"product_alias_version": {
"type": "keyword"
},
"rcp_customer_name": {
"type": "keyword"
},
"record_version": {
"type": "integer"
},
"record_version_modified_at": {
"type": "date_nanos"
},
"reference_tags": {
"type": "keyword"
},
"release_date": {
"type": "date_nanos"
},
"required_target_date": {
"type": "date_nanos"
},
"revised_target_date": {
"type": "date_nanos"
},
"revised_target_date_count": {
"type": "integer"
},
"sector": {
"type": "keyword"
},
"service_area": {
"type": "keyword"
},
"servicing_boundary": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"strategic_market": {
"type": "keyword"
},
"threatcon": {
"type": "keyword"
},
"threatcon_tag": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"updated_at": {
"type": "date_nanos"
},
"vendors_rating": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"vendors_release_date": {
"type": "date"
}
}
}
}
}
@leandrojmp did you get any chance to look into this since?
Hello,
As I said before, the issue is probably caused by the runtime fields, they have a big impact on search performance.
What happens if you go in Discover and set the time range to show 30 days of your logs? It will also be slow to show the documents?
I'm not sure if there is much that can bo done to improve the performance besides not using runtime fields.
From what I understand from your mappings, you are creating a runtime field based on the value of other tool fields. Can you not move this logic to an ingest pipeline and not use the runtime fields anymore?
I agree with @leandrojmp. I have found the query performance impact of runtime fields to be very significant. If query performance is important you will want to create those fields at ingest time. An ingest pipeline can be used to achieve this.
You say that the index has a single primary shard, but you have 5 nodes. Are these all data nodes? Something that will improve performance is increasing the number of shards to equal the number of data nodes. This will spread the load across all of the nodes, which should provide an improvement.
Hey both, thanks for the replies, I'll make the changes as you have suggested, we may have to leave 1 or 2 runtime fields as they are taking the current date to calculate a count of days to a target.
For reference : when I open the index in discovery the time to render the default 1000 documents is only a few seconds.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.