[NEWBIE] Increase speed of indexation huge logs

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

Hi @Christian_Dahlqvist
Thanks a lot
Il wil give you the all outout on monday :slight_smile:

I know its an odd version, but i d'une know why the security imposed this bersion i have already told them :+(
maybe the license id the issue, i dont know i know opensearch but i dont test it, is it very different ?
Actually elasticsearch is in prod, i have to xork with it :slight_smile: and optimize it :slight_smile: but i love :slight_smile:

What do you want to know about the 6 hot data nodes ? The elasticsearch.yml ?

I will check the gc.log, the jvm heap size is configuted to 10gb, the ram of data nodes is 24 Go.

To ensure my understanding of the age of an index for ilm policy?
Is it ?

  • 2 days including the day of creation, i.e. 2
  • or 2 days plus the day of creation, i.e. 3

I want to rollover the indices, my huge log has an index of 200 Gb, i want to roll over with size 50G.
I understand that i have to create one index template with alias for each index i want to rollover : my understanding is ok ?

My questions:

  • can i attach multiple index template to only one ILM policy ? Or i have to create one ILM POLICY for each index template?
    I dont find or i dont search very well for this point :slight_smile:

  • in kibana, my existing index patterns will be always ok to use discover although i create alias to rollover?
    I yhink so..but..i m not sure
    For me index patterns refers to the index name.
    By the way i will have to check the index name after the rollover.
    Actually, my index name is :

  • business-2023-04-20
    After the rollover, is it by default :

  • business-2023-04-20-001 for the first

  • business-2023-04-20-002 for the second?
    I will re-re-read the doc of rollover

Thanks everyone

Yes, the elasticsearch.yml together with the output of the APIs I asked for in my previous post.

Are you currently indexing into a single index or are you using time-based indices?

Hi.
Yes one daily indices per index like this :

  • tdir_business_prod-2023.04.20
  • tdir_exceptions-2023.04.20
    .....

I have read all the docs on the rollover.
I think i cant using i if i correctly understand.
For start the rollover, the requirements its bootstrapping the first index manually

  • tdir_business_prod-2023.04.20-0001
    with is_write : true for the alias works

Is my comprehension is correct also with daily indices i should have to create everyday manually the first index-001 per index i want rollover ?
Thats right ?

I will give you the elasticsearch.yml on monday

Thanks

You are currently using time-based indices the way was standard before rollover was introduced. With this approach each index covers a fixed time period, but the amount of data it holds can vary depending on ingest volumes over time. If you have large increased or decreases in ingest volumes you can end up with shards with very different sizes, which can cause performance problems.

Rollover basically does the opposite. With rollover you specify a target max index/shard size and a maximum time period. This means that an index will no longer cover a set time period. A new backing index will be created when the index/shard size exceeds the threshold or when the index covers the maximum time period. As an index can cover a very short time period during high ingest and a long period, typically several days, during low ingest volume (e.g. over the weekend) you typically get a much more even shard size distribution. If you do go with rollover you must not try to align it with specific time boundaries as that is not how it was designed and defats the purpose.

When you set up a rollover index you do it once and you then index into a write alias that points to the current backing index. It will roll over at varying times and the sequence number at the end increment. There is genrally no need to have a date or timestamp in the index name (it will be the index creation date if you do).

This is however unlikely to have a major impact on indexing throughput. To troubleshoot this, lets be systematic and see if we can first identify any bottleneck in Elasticsearch. If there is none it may be that Filebeat is not fast enough.

  1. Disk I/O is often the limiting factor when it comes to indexing throughput. Can you please share the output from iostat -x from these nodes while they are indexing?
  2. We need to know how the cluster is configured, so thease share the output from the cat/nodes API as well as the node configurations.
  3. Check the logs for evidence of GC issues on all nodes. It would be good to establish that this is not present.

Yes, i put on side rolleover then.

The iostat -x while the current indexation of the day

Node Data 1 :

Linux 4.18.0-372.13.1.el8_6.x86_64 (tlragsa028) 	24/04/2023 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1,44    0,01    0,22    0,30    0,07   97,96

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vda              0,33   28,65    184,67    737,18     0,01     1,20   3,44   4,02   30,60    3,22   0,10   560,25    25,73   1,32   3,84
dm-0             0,01    0,01      0,46      0,09     0,00     0,00   0,00   0,00    5,62   15,13   0,00    41,33    11,32   3,16   0,01
dm-1             0,03    0,02      0,11      0,09     0,00     0,00   0,00   0,00    3,53   43,55   0,00     4,01     4,00   1,09   0,01
dm-2             0,00    0,24      0,15      1,29     0,00     0,00   0,00   0,00   15,03    8,19   0,00    53,15     5,42   1,62   0,04
dm-3             0,17   20,10    253,50    862,60     0,00     0,00   0,00   0,00   25,44    4,00   0,08  1531,80    42,91   1,89   3,82
dm-4             0,00    0,12      0,00      0,61     0,00     0,00   0,00   0,00    2,39    4,26   0,00    14,44     5,31   4,04   0,05
dm-5             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    3,34    4,24   0,00    10,20     6,57   3,92   0,00
dm-6             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    1,88    2,68   0,00    16,39    52,83   1,87   0,00
vdb              0,12    0,48     69,55    127,50     0,00     0,01   0,31   2,54    5,70   18,62   0,01   598,84   263,02   1,39   0,08

Node data 2

Linux 4.18.0-372.13.1.el8_6.x86_64 (tlragsa029) 	24/04/2023 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1,00    0,01    0,20    0,26    0,04   98,47

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vda              0,11   28,10     60,19    428,02     0,00     2,01   2,47   6,68   21,86    2,74   0,08   523,74    15,23   1,22   3,45
dm-0             0,01    0,01      0,25      0,08     0,00     0,00   0,00   0,00    4,80   14,54   0,00    31,74    11,62   2,96   0,00
dm-1             0,01    0,01      0,02      0,04     0,00     0,00   0,00   0,00    4,02   69,83   0,00     4,04     4,00   0,84   0,00
dm-2             0,00    0,23      0,04      1,08     0,00     0,00   0,00   0,00   18,12    8,51   0,00    37,33     4,62   1,62   0,04
dm-3             0,05   20,92     59,87    434,61     0,00     0,00   0,00   0,00   21,43    3,87   0,08  1191,27    20,78   1,62   3,40
dm-4             0,00    0,12      0,00      0,63     0,00     0,00   0,00   0,00    2,25    3,97   0,00    11,20     5,42   3,90   0,05
dm-5             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    2,74    3,25   0,00    11,75     7,20   3,15   0,00
dm-6             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    1,63    4,89   0,00    16,44    76,45   1,85   0,00
vdb              0,00    0,11      0,00      8,42     0,00     0,00   0,13   0,23    0,16    7,96   0,00    11,32    79,01   1,24   0,01

Node data 3

 iostat -x
Linux 4.18.0-372.13.1.el8_6.x86_64 (tlragsa030) 	24/04/2023 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1,47    0,01    0,24    0,34    0,06   97,88

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vda              0,25   35,75    121,79    698,69     0,01     1,81   3,61   4,81   26,71    3,04   0,12   487,67    19,54   1,21   4,36
dm-0             0,01    0,01      0,45      0,12     0,00     0,00   0,00   0,00    5,18   12,91   0,00    36,98    16,14   2,97   0,01
dm-1             0,02    0,02      0,08      0,08     0,00     0,00   0,00   0,00    3,52   43,85   0,00     4,01     4,00   1,03   0,00
dm-2             0,00    1,61      0,27      6,59     0,00     0,00   0,00   0,00   20,53    4,22   0,01    77,12     4,08   0,72   0,12
dm-3             0,13   24,74    175,65    818,46     0,00     0,00   0,00   0,00   21,68    4,12   0,10  1336,88    33,08   1,73   4,30
dm-4             0,00    0,14      0,00      0,69     0,00     0,00   0,00   0,00    2,18    3,51   0,00    13,10     5,05   3,34   0,05
dm-5             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    2,95    4,76   0,00    13,72     6,18   3,94   0,00
dm-6             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    1,31    2,81   0,00    17,14    41,58   1,46   0,00
vdb              0,10    0,54     54,68    127,24     0,00     0,02   0,46   3,15    7,97   16,68   0,01   567,17   236,71   1,44   0,09

Node data4

iostat -x
Linux 4.18.0-372.13.1.el8_6.x86_64 (tlragsa031) 	24/04/2023 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1,26    0,01    0,22    0,29    0,05   98,16

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vda              0,33   32,58    195,36    730,01     0,01     1,26   2,59   3,71   19,00    2,98   0,10   597,47    22,41   1,16   3,80
dm-0             0,01    0,01      0,35      0,12     0,00     0,00   0,00   0,00    4,68   14,16   0,00    37,70    16,50   3,09   0,01
dm-1             0,02    0,02      0,07      0,07     0,00     0,00   0,00   0,00    3,18   40,27   0,00     4,01     4,00   0,94   0,00
dm-2             0,00    1,65      0,26      6,77     0,00     0,00   0,00   0,00   15,16    3,58   0,01    59,58     4,11   0,60   0,10
dm-3             0,14   21,81    207,63    742,90     0,00     0,00   0,00   0,00   19,07    3,82   0,09  1480,05    34,07   1,69   3,70
dm-4             0,00    0,15      0,00      0,74     0,00     0,00   0,00   0,00    2,47    3,20   0,00    25,57     5,07   3,06   0,04
dm-5             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    3,34    3,73   0,00    12,02     6,24   3,36   0,00
dm-6             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    1,74    2,41   0,00    15,77    24,91   1,83   0,00
vdb              0,02    0,21     12,95     20,60     0,00     0,00   0,03   0,13    9,74    8,32   0,00   659,83    98,52   1,27   0,03

Node data5

iostat -x
Linux 4.18.0-372.13.1.el8_6.x86_64 (tlragsa043) 	24/04/2023 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2,62    0,01    0,24    0,51    0,01   96,62

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vda              0,99   37,51    529,19   2152,33     0,05     1,26   4,92   3,25   33,39    5,90   0,25   536,37    57,39   1,42   5,45
dm-0             0,04    0,01      1,65      0,41     0,00     0,00   0,00   0,00    5,12   18,46   0,00    38,54    45,20   3,04   0,02
dm-1             0,10    0,12      0,42      0,47     0,00     0,00   0,00   0,00    2,93   21,33   0,00     4,01     4,00   0,81   0,02
dm-2             0,01    0,25      0,55      2,15     0,00     0,00   0,00   0,00   13,11    6,41   0,00    42,91     8,75   1,76   0,05
dm-3             0,69   25,65   1069,19   3070,76     0,00     0,00   0,00   0,00   19,13    7,42   0,20  1551,77   119,73   2,14   5,62
dm-4             0,00    0,12      0,00      0,58     0,00     0,00   0,00   0,00    2,13    3,50   0,00     8,96     5,01   3,47   0,04
dm-5             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    1,84    6,69   0,00    10,40    15,89   3,78   0,00
dm-6             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    2,35    3,16   0,00    12,97    57,37   2,40   0,00
vdb              0,94    3,42    542,65    922,06     0,00     0,07   0,36   1,99    1,45   18,65   0,07   578,00   269,47   1,26   0,55

Node data6

Linux 4.18.0-372.13.1.el8_6.x86_64 (tlragsa044) 	24/04/2023 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1,68    0,01    0,20    0,28    0,00   97,83

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vda              1,01   23,55    636,70   1549,31     0,04     0,70   3,57   2,87   17,87    5,91   0,16   630,17    65,79   1,29   3,16
dm-0             0,03    0,01      1,11      0,40     0,00     0,00   0,00   0,00    4,36    9,11   0,00    41,79    41,86   2,41   0,01
dm-1             0,08    0,09      0,30      0,35     0,00     0,00   0,00   0,00    3,03   26,52   0,00     4,02     4,00   0,63   0,01
dm-2             0,01    0,23      0,35      1,09     0,00     0,00   0,00   0,00   14,08    6,35   0,00    41,86     4,79   1,79   0,04
dm-3             0,46   15,14    777,92   1809,50     0,00     0,00   0,00   0,00   16,49    6,36   0,10  1679,11   119,49   2,01   3,14
dm-4             0,00    0,11      0,00      0,54     0,00     0,00   0,00   0,00    1,77    3,00   0,00     6,56     4,75   3,05   0,03
dm-5             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    1,64    3,21   0,00    10,51    18,07   2,27   0,00
dm-6             0,00    0,00      0,00      0,00     0,00     0,00   0,00   0,00    1,69    5,88   0,00    13,12    82,38   1,93   0,00
vdb              0,20    1,39    143,03    262,59     0,00     0,01   0,14   0,85    1,21   14,38   0,02   729,77   188,32   1,56   0,25

the elasticsearch.yml

cluster.name: cluster_elk_2569
node.name: node_data_tlragsa044.a2569

node.roles: [ data ]

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
path.repo: /var/nfs_snapshots_elk

network.host: 172.20.0.44

cluster.initial_master_nodes: ["node_master_tlragsa025.a2569","node_data_master_voting_only_tlragsa026.a2569","node_master_tlragsa027.a2569"]
discovery.seed_hosts: ["172.20.0.25","172.20.0.26","172.20.0.27"]

xpack.ml.enabled: false

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.key: /etc/elasticsearch/certifs/instance/instance.key
xpack.security.transport.ssl.key_passphrase: xxxxxxx
xpack.security.transport.ssl.certificate: /etc/elasticsearch/certifs/instance/instance.crt
xpack.security.transport.ssl.certificate_authorities: /etc/elasticsearch/certifs/ca/ca.crt

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.verification_mode: certificate
xpack.security.http.ssl.key: /etc/elasticsearch/certifs/instance/instance.key
xpack.security.http.ssl.key_passphrase: xxxxx
xpack.security.http.ssl.certificate: /etc/elasticsearch/certifs/instance/instance.crt
xpack.security.http.ssl.certificate_authorities: /etc/elasticsearch/certifs/ca/ca.crt
xpack.security.http.ssl.client_authentication: optional


xpack.security.audit.enabled: true
xpack.monitoring.elasticsearch.collection.enabled: true
xpack.monitoring.collection.enabled: true

logger.org.elasticsearch.cluster.coordination.ClusterBootstrapService: TRACE
logger.org.elasticsearch.discovery: TRACE

bootstrap.memory_lock: true

I think i have a bug of the balancing index in the differnets nodes with the ILM

look this index is in cold phase

"tdir_business_prod-2023.04.18": {
		"settings": {
			"index": {
				"routing": {
					"allocation": {
						"include": {
							"_tier_preference": "data_cold,data_warm,data_hot"

But the shards are in the wrong nodes, this nodes are my hot nodes not the cold nodes

tdir_business_prod-2023.04.18 0     r      STARTED 131383993 131.1gb 172.20.0.31 node_data_tlragsa031.a2569
tdir_business_prod-2023.04.18 0     p      STARTED 131383993 131.2gb 172.20.0.29 node_data_tlragsa029.a2569

How can i reallocate/rebalancing please ? because my disks grow up very veru fast

EDIT : I found another index in that situation

I think my understanding was wrong also :slight_smile:
I meaning that in the COLD PHASE, replica AND primary balance into the COLD nodes ? Not only the replica.
I need this information for fit the disks volume of the nodes to avoid the saturation


tdir_business_prod-2023.04.13 0     r      STARTED 507899 494.5mb 172.20.0.39 node_data_cold_tlragsa039.a2569
tdir_business_prod-2023.04.13 0     p      STARTED 507899 494.5mb 172.20.0.28 node_data_tlragsa028.a2569

EDIT 2 :
It's like i have only one node HOT data working.
When i look up the _cat/nodes API, the disk volume of the HOT data nodes seems very differents no ? Is anybody thinks its a normal situation ?


**node_data_tlragsa029.a2569               172.20.0.29   d      9      23.1gb          99  23.3gb    999.9gb   161.2gb    838.7gb             16.13**
node_coordination_tlragsa038.a2569       172.20.0.38   -      0         9gb          39  23.3gb     49.9gb     1.1gb     48.7gb              2.38
**node_data_tlragsa030.a2569               172.20.0.30   d      0      23.1gb          99  23.3gb    999.9gb   877.3gb    122.6gb             87.74**
**node_data_tlragsa043.a2569               172.20.0.43   d     14      23.1gb          99  23.3gb    999.9gb   645.9gb      354gb             64.60**
node_ingest_tlragsa033.a2569             172.20.0.33   i      0      15.2gb          65  23.3gb    299.9gb       3gb    296.9gb              1.01
node_data_cold_tlragsa039.a2569          172.20.0.39   c      0      22.4gb          96  23.3gb    999.9gb    12.6gb    987.3gb              1.27
**node_data_tlragsa031.a2569               172.20.0.31   d      0      19.2gb          82  23.3gb    999.9gb   301.4gb    698.5gb             30.14**
node_ingest_tlragsa035.a2569             172.20.0.35   i      1      14.8gb          64  23.3gb     49.9gb     1.1gb     48.7gb              2.38
node_master_voting_only_tlragsa026.a2569 172.17.252.26 mv     0      10.1gb          66  15.4gb     49.9gb     1.4gb     48.5gb              2.83
node_data_warm_tlragsa041.a2569          172.20.0.41   w      0      22.9gb          98  23.3gb    999.9gb    19.3gb    980.6gb              1.93
**node_data_tlragsa044.a2569               172.20.0.44   d     18        23gb          99  23.3gb    999.9gb   589.7gb    410.1gb             58.98**
node_ingest_tlragsa045.a2569             172.20.0.45   i      0      13.3gb          57  23.3gb    299.9gb     2.9gb      297gb              0.98
node_master_tlragsa025.a2569             172.17.252.25 m      2      10.8gb          70  15.4gb     49.9gb     1.4gb     48.5gb              2.83
node_ingest_tlragsa036.a2569             172.20.0.36   i      0      15.2gb          66  23.3gb     49.9gb     1.2gb     48.7gb              2.42
node_ingest_tlragsa046.a2569             172.20.0.46   i      0      13.4gb          57  23.3gb    299.9gb     2.9gb      297gb              0.98
node_data_cold_tlragsa040.a2569          172.20.0.40   c      0      16.3gb          70  23.3gb    999.9gb    13.1gb    986.8gb              1.31
node_master_tlragsa027.a2569             172.17.252.27 m      0       9.3gb          60  15.4gb     49.9gb     1.1gb     48.7gb              2.38
node_data_warm_tlragsa042.a2569          172.20.0.42   w      0      23.1gb          99  23.3gb    999.9gb      36gb    963.9gb              3.61
**node_data_tlragsa028.a2569               172.20.0.28   d      0        23gb          99  23.3gb    999.9gb   546.9gb      453gb             54.69**
node_ingest_tlragsa034.a2569             172.20.0.34   i      0      14.5gb          62  23.3gb     49.9gb     1.4gb     48.5gb              2.85

thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.