I get various result from this. Sometimes elastic index get all rows from logstash. Sometimes only 4000 rows. sometimes 27000 out of 28000 rows. sometimes it filters more than rows than in input but on those elastic receives all rows.
With this call: curl -XGET 'localhost:9600/_node/stats/events?pretty'
I get these results,(all rows are received in elasticsearch)
Hi Sachin,
When I run the sql on both tables it returns 28071 rows.
one is
'INVOICE' + CAST(INVOICE.Id AS VARCHAR(20)) AS uniqueid
the other is
'PERSON' + CAST(PERSON.Id AS VARCHAR(20)) AS uniqueid
so it shouldnt overwrite.
I have rerun it a couple of times like this:
Remove index in elasticsearch
Start logstash
Wait until it finished in the logs. [2023-03-31T10:21:56,427][INFO ][logstash.javapipeline ][index] Pipeline terminated {"pipeline.id"=>"index"}. No Errors or Warning in the logstash logs.
Check kibana that there are 28000 docucments.
Check elasticsearch logs. No errors or warning there either.
Twice the index has all 28071 documents.
Four times the index had 4000 documents. Even after waiting 4h.
Four times the index had 27713 docucments.
What i can see from localhost:9600/_node/stats/events?pretty is that logstash is receiving 28071 rows correctly. But sometimes filters 53140 rows and sometimes 4000 rows. I'm a bit confused how that could happen.
I have many database pulling data and my benchmark is
if DatabaseA has X record I should have X record in index.
it seems like you have to jdbc running on a logstash, i.e each is pulling 28071 record. and then somehow you are using filter to combine them? or are you just putting them in elastic as is?
that logic is still not clear to me. if you are doing ingestion as is then you should have 28071+28071 record in elastic
Yeah, that what I thought too. I don't have any filters(my first post). Only input from two tables person and invoice which should generate a total of 28071 records.
I'm running logstash 8.7.0.
Could there be anything in configuration that can mess with the data?
Are there any other troubleshooting methods?
I think I found out why its happening. While changing the query(including adding fields). The new fields didn't reach elasticsearch. Restarting logstash didn't fix it either. I think logstash was stashing the old messages somehow and resends it to elasticsearch.
I had queue.type: persisted to help against message losses:
When I changed back to the default setting queue.type: memory Everything worked as expected. Elasticsearch got the new fields and all rows was indexed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.