I do not know ntopng, so do not know how this indexes data nor what its performance characteristics are.
I ask the ntopng engineer, and they told me I should tune my ES java VM to increase ES ingestion speed.
But I don't know how to tune it.
Did he tell you what bulk size they use?
What kind of storage do you have in your cluster? What does disk I/O and iowait look like during indexing?
From what I have seen so far it does not look like Elasticsearch is struggling or being the bottleneck.
I had a quick look at the ntopng code, and it looks to me (it has been a while since I used C++, so I could be reading it wrong...) like they send requests that are no larger than 16 kB in size. If that is the case, it could mean quite small bulk requests (especially if events are not very small), which could be inefficient, especially if indexing is not multi-threaded.
That depends on the design of the application indexing into Elasticsearch, in this case ntopng.
You might be able to get better performance by updating your indexing template to only have 1 primary shard instead of the default 5 (if you still are using this) and possibly also increasing the refresh interval.
sorry, How could I update my indexing template ><
this is my setting, and my indies such like ntopng-2017.11.06
Is my setting correct?
thank you in advance.
PUT /ntopng-*/_settings
{
"index" : {
"refresh_interval" : "5s"
}
}
PUT _template/template_1
{
"template": "ntopng-*",
"settings": {
"number_of_shards": 1
}
}
It still drop flow.
What does a typical document look like?
a document.
{
"_index": "ntopng-2017.11.06",
"_type": "ntopng",
"_id": "AV-Q0VuKEFis5A4jOVHT",
"_score": 1,
"_source": {
"@timestamp": "2017-11-06T10:10:43.0Z",
"type": "ntopng",
"IN_SRC_MAC": "00:00:00:00:00:00",
"OUT_DST_MAC": "00:00:00:00:00:00",
"IPV4_SRC_ADDR": "120.127.163.167",
"IPV4_DST_ADDR": "13.228.12.182",
"L4_SRC_PORT": 43176,
"L4_DST_PORT": 443,
"PROTOCOL": 6,
"L7_PROTO": 178,
"L7_PROTO_NAME": "SSL.Amazon",
"TCP_FLAGS": 0,
"IN_PKTS": 18,
"IN_BYTES": 5107,
"OUT_PKTS": 0,
"OUT_BYTES": 0,
"FIRST_SWITCHED": 1509962742,
"LAST_SWITCHED": 1509963043,
"json": {
"5": "0",
"10": "130",
"14": "0",
"15": "120.127.163.1",
"16": "0",
"17": "0",
"130": "120.127.163.4"
},
"CLIENT_NW_LATENCY_MS": 0,
"SERVER_NW_LATENCY_MS": 0,
"SRC_IP_COUNTRY": "TW",
"SRC_IP_LOCATION": [
121.496597,
25.0418
],
"DST_IP_COUNTRY": "US",
"DST_IP_LOCATION": [
-122.342201,
47.634399
],
"NTOPNG_INSTANCE_NAME": "DESKTOP-L1VSVD3",
"INTERFACE": "tcp://127.0.0.1:2055"
},
"fields": {
"@timestamp": [
1509963043000
]
}
}
The fact that flows are still dropped does not mean that Elasticsearch is saturated and the bottleneck. It could also be that ntopng
is indexing into Elasticsearch in an inefficient manner and therefore is not able to keep up with the traffic.
In order to index efficiently into Elasticsearch, the application should ideally:
- Use persistent connections for the HTTP interface
- Use bulk requests of an appropriate size, generally no more than a couple of MB in size
- Send multiple bulk indexing requests in parallel
Logstash, Beats and our benchmarking tool Rally all do this, so you could use one of these to run a test to see how much your node is able to handle. That would tell you whether Elasticsearch is the bottleneck or not.
ok, I got it!
I think if I can't tune the ntopng indexing manner, the only way I could do is improve the ES performance.
this is the reference I found:
https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html
I run the command below on Dev tool in Kibana
PUT /ntopng-*/_settings
{
"index" : {
"refresh_interval" : "30s"
}
}
PUT _template/template_1
{
"template": "elasticsearch",
"settings": {
"number_of_shards": 5
}
}
and run the rest API
GET _nodes/process?pretty
output:
{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "elasticsearch",
"nodes": {
"Ku2xA7CoTzec-5hieGRJgg": {
"name": "Ku2xA7C",
"transport_address": "127.0.0.1:9300",
"host": "127.0.0.1",
"ip": "127.0.0.1",
"version": "5.6.2",
"build_hash": "57e20f3",
"roles": [
"master",
"data",
"ingest"
],
"attributes": {
"ml.max_open_jobs": "10",
"ml.enabled": "true"
},
"process": {
"refresh_interval_in_millis": 1000,
"id": 8528,
"mlockall": false
}
}
}
}
the refresh_interval_in_millis didn't change.
Is the command I type correct?
Or there's ES config I can tune it .
thank you in advance!
That is the right guide, but I would expect any gains from this to be small compared to improving the indexing process. I looked at the ntopng repository, and it seems like you have added feedback to this issue.
Instead of writing directly to Elasticsearch, you could try sending the data through Logstash using their Logstash integration, as this may have a more efficient implementation (I have not tested it or looked at the source code). Logstash is very flexible and can be tuned extensively, although default settings are quite efficient and sufficient for many use cases.
That sounds great!
I would try it in these days.
Thank you : )
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.