Convert ElasticSearch index to sql

Hello,

I currently use elasticsearch for my graylog server, everything works fine but disk space fills up too fast

I understood that it is possible to convert elasticsearch indexes into sql tables, which seems to me to be a good solution because I will then compress these famous sql data

I have not yet found anything to help me, that’s why I come to you

For my graylog server, it has only one node, and I’ve configured it to make a new index every day and the indexes start to be deleted after 365 days to keep one year of logs

--- /mnt/graylog/elasticsearch/nodes/0/indices ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         /..
  133,8 GiB [##########] /qvR8mKiKQRWUHhXWxRAtGQ
   29,0 GiB [##        ] /pLWbpvYrRWCLmTbuImQ6iQ
   28,5 GiB [##        ] /YvnIgCv5QPugXtrGillH8w
   26,8 GiB [##        ] /dYCzPuBaR6qSAF-4PqwHuQ
   20,2 GiB [#         ] /hXiT4VpxRni39hg2qcN9-g
   19,7 GiB [#         ] /QpkZaipcTNSbyyjXZk3edg
   19,7 GiB [#         ] /zMacyH4ARgOYXjbFRgaUsA
   19,1 GiB [#         ] /Ph7nwRutQOe4e3lrm60K7Q
   18,7 GiB [#         ] /OzlEyYTgRW-ESstP7uqksA
   17,7 GiB [#         ] /ROkebCqiS2SKmtnoLd5K-g
   16,0 GiB [#         ] /P6DUFVqEQo-YseozDhn15A
   16,0 GiB [#         ] /AdixeDwmTTOs6fDIV8yYyg
   15,2 GiB [#         ] /ATP7FWk_SpSqchDm2nRIKA
   12,9 GiB [          ] /zBebioJGRZyg3aVV2nv-Kw
   12,7 GiB [          ] /XhmTQ3n2Roaxe6JCpX6aBQ
    6,6 GiB [          ] /xlq-jtI4Twu0kJJsc-L9PA
    6,3 GiB [          ] /YchNdqu6S-2oUMygO8EEBQ
    3,4 GiB [          ] /YXxNwkPHRmmLz52Ths43zg
    3,3 GiB [          ] /gMVH_r3zQKGz5cJT4M1Htw
    3,3 MiB [          ] /X_qzLuCyTHSqdhqBeh-0qw
    2,3 MiB [          ] /M1YpCI7FQ-eYWh2qHWPJdw
    1,2 MiB [          ] /yJbEaRg7Qvu4pxjV3W0OiQ
   88,0 KiB [          ] /Ze2VXGWKReq6ReoWw8uzXA
   88,0 KiB [          ] /B2yRoFQDRje6sSZrwUZ1yg
   84,0 KiB [          ] /WNniZ76vRZm_U0xiny7OUQ
   84,0 KiB [          ] /BcNUyhiiS6unxRCK3PShlA
   84,0 KiB [          ] /XAzvAgllTnO0GAyMEDX47Q
   84,0 KiB [          ] /n47EEchMT4K1qOqysyR_yw
   84,0 KiB [          ] /gQbHFxbFQe2bQ_9-CLRudg
   36,0 KiB [          ] /mt3t8SLsQ_CNxdCCGgaFBw
   24,0 KiB [          ] /JZu86JSRSqe41AjVMOhJcA
   24,0 KiB [          ] /lNzkHX6ATle462uQwH3fWg

Thank you

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

But then the data won't be online anymore... And not searchable, right?

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

hello @dadoonet , thank for the answer

sorry for the picture, the problem is normally fixed

It doesn’t matter that the data is no longer online, I think I will keep 1 month of data before converting them. However, will it be possible to reconvert them in the other direction in exceptional cases?

Concerning what you asked me, the GET/ command does not work for me, but I could see on another topic that it is the equivalent of curl -XGET localhost:9200/, so:

[root@PLISSIEM01 /]# curl -XGET localhost:9200/
{
  "name" : "PLISSIEM01",
  "cluster_name" : "graylog",
  "cluster_uuid" : "F0a2rApkQnKZn3P9uro-4Q",
  "version" : {
    "number" : "7.11.2",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "3e5a16cfec50876d20ea77b075070932c6464c7d",
    "build_date" : "2021-03-06T05:54:38.141101Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

[root@PLISSIEM01 /]# curl -XGET localhost:9200/_cat/nodes?v
ip           heap.percent ram.percent cpu load_1m load_5m load_15m node.role  master name
10.15.20.138           49          98  10    1.77    1.26     1.17 cdhilmrstw *      PLISSIEM01


[root@PLISSIEM01 /]# curl -XGET localhost:9200/_cat/health?v
epoch      timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1616667654 10:20:54  graylog green           1         1     50  50    0    0        0             0                  -                100.0%

[root@PLISSIEM01 /]# curl -XGET localhost:9200/_cat/indices?v
health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   graylog_9                       YchNdqu6S-2oUMygO8EEBQ   1   0    9763721            0      6.3gb          6.3gb
green  open   graylog_8                       ATP7FWk_SpSqchDm2nRIKA   1   0   35930001            0     15.1gb         15.1gb
green  open   graylog_7                       OzlEyYTgRW-ESstP7uqksA   1   0   40007039            0     18.7gb         18.7gb
green  open   graylog_6                       YvnIgCv5QPugXtrGillH8w   1   0   40001347            0     28.4gb         28.4gb
green  open   .apm-custom-link                lNzkHX6ATle462uQwH3fWg   1   0          0            0       208b           208b
green  open   .kibana_task_manager_1          X_qzLuCyTHSqdhqBeh-0qw   1   0          8         4309    684.3kb        684.3kb
green  open   .kibana-event-log-7.11.2-000001 B2yRoFQDRje6sSZrwUZ1yg   1   0          6            0     22.4kb         22.4kb
green  open   gl-system-events_2              gQbHFxbFQe2bQ_9-CLRudg   4   0          0            0       832b           832b
green  open   graylog_1                       YXxNwkPHRmmLz52Ths43zg   1   0   20000430            0      3.4gb          3.4gb
green  open   graylog_0                       gMVH_r3zQKGz5cJT4M1Htw   1   0   20000807            0      3.3gb          3.3gb
green  open   gl-system-events_0              WNniZ76vRZm_U0xiny7OUQ   4   0          0            0        1kb            1kb
green  open   gl-system-events_1              XAzvAgllTnO0GAyMEDX47Q   4   0          0            0       832b           832b
green  open   graylog_10                      ROkebCqiS2SKmtnoLd5K-g   1   0   64879636            0     17.6gb         17.6gb
green  open   graylog_5                       pLWbpvYrRWCLmTbuImQ6iQ   1   0   40034467            0       29gb           29gb
green  open   graylog_4                       QpkZaipcTNSbyyjXZk3edg   1   0   30009346            0     19.7gb         19.7gb
green  open   graylog_3                       zMacyH4ARgOYXjbFRgaUsA   1   0   30045940            0     19.6gb         19.6gb
green  open   graylog_12                      AdixeDwmTTOs6fDIV8yYyg   1   0   40985952            0     15.9gb         15.9gb
green  open   graylog_2                       qvR8mKiKQRWUHhXWxRAtGQ   1   0  219472229            0    133.8gb        133.8gb
green  open   graylog_11                      dYCzPuBaR6qSAF-4PqwHuQ   1   0  134916808            0     26.7gb         26.7gb
green  open   graylog_14                      XhmTQ3n2Roaxe6JCpX6aBQ   1   0   37270862            0     12.6gb         12.6gb
green  open   graylog_13                      zBebioJGRZyg3aVV2nv-Kw   1   0   37669952            0     12.8gb         12.8gb
green  open   graylog_16                      Ph7nwRutQOe4e3lrm60K7Q   1   0   60741815            0     19.1gb         19.1gb
green  open   graylog_15                      P6DUFVqEQo-YseozDhn15A   1   0   44014539            0       16gb           16gb
green  open   graylog_18                      xlq-jtI4Twu0kJJsc-L9PA   1   0   30444071            0      6.3gb          6.3gb
green  open   graylog_17                      hXiT4VpxRni39hg2qcN9-g   1   0   72473953            0     20.1gb         20.1gb
green  open   .apm-agent-configuration        JZu86JSRSqe41AjVMOhJcA   1   0          0            0       208b           208b
green  open   gl-events_1                     n47EEchMT4K1qOqysyR_yw   4   0          0            0       832b           832b
green  open   gl-events_0                     BcNUyhiiS6unxRCK3PShlA   4   0          0            0        1kb            1kb
green  open   .kibana_1                       M1YpCI7FQ-eYWh2qHWPJdw   1   0        338          129      2.2mb          2.2mb
green  open   gl-events_2                     yJbEaRg7Qvu4pxjV3W0OiQ   4   0        239            0    365.8kb        365.8kb
green  open   .async-search                   mt3t8SLsQ_CNxdCCGgaFBw   1   0          0            0      3.8kb          3.8kb

small precision, the index of 133G is also big because I set up the creation of a new index every day a little bit late, now I have much lighter indexes, to compress it may pose problem

Thank you for your help

I'd probably use the _split API for index graylog_2 unless the response time is correct for you.

Anyway, coming back to the problem.

You can think of multiple things.

  1. Instead of sending your logs into elasticsearch only, send them to both S3 (or whatever store) and elasticsearch at the same time. Logstash can do that with multiple outputs.
    Then after some time (a week, a month...), DELETE the old indices. I'd recommend using ILM (Index LifeCycle Management) for that. A crontab job should work as well.

  2. Use the hot/warm/cold/frozen architecture of Elasticsearch. See Data tiers | Elasticsearch Reference [7.12] | Elastic. So you can move old data to other cheaper nodes.

  3. Use Snapshot/Restore feature. You can snapshot your daily indices to S3, GCS... and then drop the old indices. With 7.12, we even made those snapshots searchable. (but it requires a commercial licence).

HTH

So we need an other server in all the case, and here there is no longer any question of converting elasticsearch indexes to sql ?

In recent versions you can query Elasticsearch using SQL but I have never heard of any conversion to RDBMS. I would recommend looking at snapshot/restore mechanism to offload data that can be restored when needed.

I understand thanks, but from my point of view a snapshot is a solution to recover an old version in case of loss, even if I delete the original index and only keep the snapshot, will it really save me disk space?

As for the hot/warm/cold architecture, it allows to better manage these data but (even if we move them) they always have the same weight?

Snapshot has the same size as the primary more or less so may not save you space. Index does typically not vary between tiers unless you change to best_compression.

Thank you @Christian_Dahlqvist and @dadoonet for your help, I will try to make a S3 configuration