Elastisearch 5.6.4 on RPI 3+

bruno-d · May 12, 2018, 11:53pm

Hey, I am struggling on the good settings few times and have some heap issue. I have from Jan 1th a daily logstash indices and run around 650 shards ????

Daily indices are around 3mb.
**My config is one node, bootstrap.memory_lock: true, **
xpack.ml.enabled: false
**action.auto_create_index: true **
**xpack.monitoring.enabled: false **
xpack.security.enabled: false

Javaheap are -Xms470m -Xmx470m.

I would like to have indices on memory only from the 5 last days only for instances.
Have the rest in the disk at rest. My usage is mostly to query datas on the same or previous day, very very rarely going a lot backward.

When I going below the javaheap I have error and elastisearch crash. using 470 I am thinking PI is getting very hot in a week.... i see that the CPU increase.

**What would you suggest as parameters config ?

{
** "_shards": {**
** "total": 1254,**
** "successful": 625,**
** "failed": 0**
** },**
"_all": {
"primaries": {
"docs": {
"count": 1810472,
"deleted": 515911
},
"store": {
"size_in_bytes": 762656043,
"throttle_time_in_millis": 0
},
"indexing": {
"index_total": 8132,
"index_time_in_millis": 685743,
"index_current": 0,
"index_failed": 15
},
"get": {
"total": 3334,
"time_in_millis": 2560,

"logstash-2018.05.04": {
** "primaries": {**
** "docs": {**
** "count": 5293,**
** "deleted": 165**
** },**
** "store": {**
** "size_in_bytes": 2699851,**
** "throttle_time_in_millis": 0**
** },*
"flush": {
"total": 5,
"total_time_in_millis": 0
},
"segments": {
"count": 25,
"memory_in_bytes": 228032,
"terms_memory_in_bytes": 197031,
"stored_fields_memory_in_bytes": 7048,
"term_vectors_memory_in_bytes": 0,
"norms_memory_in_bytes": 0,
"points_memory_in_bytes": 693,
"doc_values_memory_in_bytes": 23260,
"index_writer_memory_in_bytes": 0,
"version_map_memory_in_bytes": 0,
"fixed_bit_set_memory_in_bytes": 0,
"max_unsafe_auto_id_timestamp": 1525392003044,
"file_sizes": {}
},
"translog": {
"operations": 0,
"size_in_bytes": 430
}
"total": {
"docs": {
"count": 5293,
"deleted": 165
},
"store": {
"size_in_bytes": 2699851,
"throttle_time_in_millis": 0
},
"indexing": {
"index_total": 0
}
"merges": {
"current": 0
"cache_count": 0,
"evictions": 0
},
"fielddata": {
"memory_size_in_bytes": 0,
"evictions": 0
},
"completion": {
"size_in_bytes": 0
},
"segments": {
"count": 25,
"memory_in_bytes": 228032,
"terms_memory_in_bytes": 197031,
"stored_fields_memory_in_bytes": 7048,
"term_vectors_memory_in_bytes": 0,
"norms_memory_in_bytes": 0,
"points_memory_in_bytes": 693,
"doc_values_memory_in_bytes": 23260,
"index_writer_memory_in_bytes": 0,
"version_map_memory_in_bytes": 0,
"fixed_bit_set_memory_in_bytes": 0,
"max_unsafe_auto_id_timestamp": 1525392003044,
"file_sizes": {}
},
"translog": {
"operations": 0,
"size_in_bytes": 430
}
},

JKhondhu · May 13, 2018, 12:39am

Todays date, gives us 132 days since Jan 1st. 132 * 5 primary shards which gives us ~660.
As you mentioned the log sizes are ~3mb. I believe you should modify the logstash template away from the default 5 P shards and 1 R to 1 primary.

It is very likely your large number of shards are eating away at your resources and thus the having some heap issues.

Oh and Javaheap are -Xms470m -Xmx470m for your single node ES cluster is low, if you are having heap issues, you need to start with increasing this heap size......I am very surprised that this is up and running with 600+ shards with a 470mb heap.

bruno-d · May 13, 2018, 1:39am

Thanks mate for the answer, i am real dummy sorry. Will try to be accurate.
I ran to modify the default template .

PUT /_template/my_logs
{
"template": "logstash-*",
"order": 1,
"settings": {
"number_of_shards": 1
},
"mappings": {
"default": {
"_all": {
"enabled": false
}
}
}
}

is it correct ?
before upgrading the HEAP (taking into account the RPI3 have some python script too ) i would like to be sure that i am running one shard a day. I don't know how but i suppose the template command will do it.
is is possible as an indices is 3MB a day to keep only on the memory 5 days and the old data on the disk only ? and them probably flush the memory later ? if so how ?

thanks

Christian_Dahlqvist · May 13, 2018, 3:44am

You have far too many shards for that size of cluster/heap. This blog post provides some practical guidelines. I would recommend changing your Logstash Elasticsearch output to instead use a monthly pattern.

That index template looks correct to me, but I have not tested it.

This is not how Elasticsearch works. All shards use a bit of memory and caching is used, but data is not constantly kept in memory.

bruno-d · May 13, 2018, 9:42pm

Hi Christian, thanks for the explanation.

I understand that all shards are taken a bit of memory so on a +1 shard per months or day how even can I run with a constant memory ? it's expected from my understanding that memory always increase ! When i query my current active shards(shards of the day) i have prefixed the indices by /%3Clogstash-%7Bnow%2Fd%7D%3E/syslog/ (and so query a period in a indices and so a day) so i still not understand why theses all shards still use memory ( i would prefer they are deep sleeping on the disk ) ? That''s ok if i can't but Is it possible to list the shards in memory and to limit the last XXX shards the system uses ? (if i can not limit indices may i limit the current maximum active shards) ?
there's still something a bit obscur for my understanding..
The note point out that for 600-750 shards i should go to 30 GB heap (i have 660 !) !! I now apply and run a shard daily per indices with a CPU overload of 150 (htop) on the RPI3 which is very correct. Do you have a link to explain how i can change the period of indices from day to month ? i will also have to modify all the queries using the prefix _/%3Clogstash-%7Bnow%2Fd%7D%3E/syslog/ to express indices in year-month only.

thanks very much for the help

Christian_Dahlqvist · May 14, 2018, 4:36am

You can not.

All open indices use a bit of heap. You can close indices, which prevents them from using heap, but this also means that they can not be searched without being explicitly opened first.

If you want to change Logstash to use a monthly index, change your Elasticsearch output block as follows:

elasticsearch {
  index => "logstash-%{+YYYY.MM}"
}

If you use a monthly index with just a single primary shard, you would be able to store a full years worth of data in just 12 shards, which sounds much more reasonable for a node with your spec.

bruno-d · May 15, 2018, 8:40am

Heya, to go fast. I removed the data ( I can re index later if needed from the backup). CPU is 50 for htop overload (was min 150 on the RPI3). I have 6 shards for the month. let's check later this. I change all queries and its fine except the map.
My geoIP filter is running fine (country code, lat, long) but the MAP gone... when i select a new map and geo hash i have this item No Compatible Fields: The "logstash-*" index pattern does not contain any of the following field types: geo_point

How ever geoip pipeline filter is running and coordinates are filled as well as city, iso_code ? i got this issue one time but i forgot and could not found how i solved !

bruno-d · May 16, 2018, 8:25am

hey guys, any help on geo_point issue I can't find a thread that solve ?

Christian_Dahlqvist · May 16, 2018, 10:06am

As this is a different problem, I would recommend opening a separate issue under the appropriate category.

system · June 13, 2018, 10:06am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.