Data Storage

Hello eveyone,
I have an issue: The size of the data is not the same in MySQL and Elasticsearch . What can I do for this problem? Someone help me, please

Welcome to our community! :smiley:

Why is that a problem?

1 Like

thank you for reply
That is a problm because the size of data in mysql 500 gb in 6 months (in streaming) but in the elasticsearch just a week the size of data is 418gb that is too much

The data structures are completely different though. To get a better idea of what is happening;

  • What is the mapping of the index?
  • What is the output of _cat/indices/yourindexname?v?
  • What version of Elasticsearch are you running??

The mapping of the index are :
MYSQL => logstash => elasticsearch=> kibana.
The ouput just for today (because I delete the indexes yesterday ) :
HmMins8VQcWMhf1FhDfCaw 1 1 66205606 0 47.2gb 47.2gb
The version of elasticsearch is
version" : {
"number" : "8.4.0",

Please don't post pictures of text, logs or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them :slight_smile:

1 Like

Thank you for help me
The mapping of the index are :
MYSQL => logstash => elasticsearch=> kibana.
The ouput just for today (because I delete the indexes yesterday ) :
HmMins8VQcWMhf1FhDfCaw 1 1 66205606 0 47.2gb 47.2gb
The version of elasticsearch is
version" : {
"number" : "8.4.0",

What does your Logstash pipeline look like? Does the document count in the index match what you would expect?

Thank you for replay

Please do not post images of text as it can be hard to read and impossible to search.

Is your SELECT statement using the tracking column? It is not possible to tell from the image.

I also see that you are indexing the data twice into two eparate indices. This will naturally take up more space. Have you optimised your mappings along with these guidelines?

1 Like

My SELECT :
SELECT DISTINCT log.ID_CALL as idcall, log.* ,cus.NAME as cus_clientname, cus.CODE_CLIENT as cus_client_code, cus.CODE_CLIENT_EXTERNE as cus_client_code_externe, cus.COUNTRY as cus_client_country, cus.COUNTRY_CODE as cus_client_country_code, pl.PREFIXE as pl_prefixe, pl.COUNTRY as pl_country, pl.COUNTRY_ZONE as pl_country_zone, cm.MODE as cli_mode_name FROM LOG_CDR_TODAY AS log, CONFIG_CUSTOMER as cus , CONFIG_PRICE_LIST as pl, CLI_MODE AS cm WHERE cus.ID_CUSTOMER=log.ID_CUSTOMER AND log.ID_PRICE_LIST=pl.ID_PRICE_LIST AND cm.ID=log.CLI_MODE

TRAKING COLUMN : "idcall"

yeah I have two outputs the first for the history and the second is just for today with cronjob

I have not used this plugin in quite some time, but it seems to me like you have configured a numeric tracking column but not added any condition in the WHERE clause where you only select idcall > :sql_last_value as shown in the example here. I therefore believe you might be index all the data every time the plugin runs. As you are not specifying document IDs in the Elasticsearch output plugins you should be able to search for a idcall value and see how many documents you have in Elasticsearch.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.