Hello eveyone,
I have an issue: The size of the data is not the same in MySQL and Elasticsearch . What can I do for this problem? Someone help me, please
Welcome to our community!
Why is that a problem?
thank you for reply
That is a problm because the size of data in mysql 500 gb in 6 months (in streaming) but in the elasticsearch just a week the size of data is 418gb that is too much
The data structures are completely different though. To get a better idea of what is happening;
- What is the mapping of the index?
- What is the output of
_cat/indices/yourindexname?v
? - What version of Elasticsearch are you running??
The mapping of the index are :
MYSQL => logstash => elasticsearch=> kibana.
The ouput just for today (because I delete the indexes yesterday ) :
HmMins8VQcWMhf1FhDfCaw 1 1 66205606 0 47.2gb 47.2gb
The version of elasticsearch is
version" : {
"number" : "8.4.0",
Please don't post pictures of text, logs or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them
Thank you for help me
The mapping of the index are :
MYSQL => logstash => elasticsearch=> kibana.
The ouput just for today (because I delete the indexes yesterday ) :
HmMins8VQcWMhf1FhDfCaw 1 1 66205606 0 47.2gb 47.2gb
The version of elasticsearch is
version" : {
"number" : "8.4.0",
What does your Logstash pipeline look like? Does the document count in the index match what you would expect?
Please do not post images of text as it can be hard to read and impossible to search.
Is your SELECT statement using the tracking column? It is not possible to tell from the image.
I also see that you are indexing the data twice into two eparate indices. This will naturally take up more space. Have you optimised your mappings along with these guidelines?
My SELECT :
SELECT DISTINCT log.ID_CALL as idcall, log.* ,cus.NAME as cus_clientname, cus.CODE_CLIENT as cus_client_code, cus.CODE_CLIENT_EXTERNE as cus_client_code_externe, cus.COUNTRY as cus_client_country, cus.COUNTRY_CODE as cus_client_country_code, pl.PREFIXE as pl_prefixe, pl.COUNTRY as pl_country, pl.COUNTRY_ZONE as pl_country_zone, cm.MODE as cli_mode_name FROM LOG_CDR_TODAY AS log, CONFIG_CUSTOMER as cus , CONFIG_PRICE_LIST as pl, CLI_MODE AS cm WHERE cus.ID_CUSTOMER=log.ID_CUSTOMER AND log.ID_PRICE_LIST=pl.ID_PRICE_LIST AND cm.ID=log.CLI_MODE
TRAKING COLUMN : "idcall"
yeah I have two outputs the first for the history and the second is just for today with cronjob
I have not used this plugin in quite some time, but it seems to me like you have configured a numeric tracking column but not added any condition in the WHERE clause where you only select idcall > :sql_last_value
as shown in the example here. I therefore believe you might be index all the data every time the plugin runs. As you are not specifying document IDs in the Elasticsearch output plugins you should be able to search for a idcall
value and see how many documents you have in Elasticsearch.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.