Logstash elasticsearch input error

rodri.gz · February 3, 2022, 10:05am

Hello

I am extracting data from an http server through the apace pipeline, indicating the pipeline to use in the logstash output and it indexes correctly. On the other hand, I want to enrich the data already obtained and I have decided to make an Elasticsearch input collecting the data every 2 minutes. My input currently looks like this:

input {
     # saca todo de los indices que pertenezcan al formato index entre las fechas introducidas
  elasticsearch {
        hosts => ["elk1:9200","elk2:9200","elk3:9200"]
        ssl => true
        user => elastic
        password => "xxxx"
        ca_file => '/path/cert.pem'
        index => "filebeat-7.16.3-mmmmm-http-server-2022*"
        schedule => "*/2 * * * *"
        size => 10000
        query => '{
          "query":{
            "range":{
              "@timestamp":{
                "gte": "now-2m",
                "lte": "now"
              }
            }
          }
        }'
  }
}

But there are some small differences. Is there any method to be able to use the pipeline and enrich in the same etl?

otherwise what would be the best config for the input ??

It would be a great help. Thanks!!!

sholzhauer · February 3, 2022, 8:32pm

If you are going through logstash in the first ingest yes, it should be possible to do so in one go.
What exactly are you trying to enrich/do?

rodri.gz · February 3, 2022, 9:28pm

thanks for answering!

the first etl consists of an input and an output with the parameter pipeline => apache-access.

the second has an Elasticsearch input to get the data parsed by etl 1.8

With the data already parsed by the Apache module, I want to extract words from the [url][original] field or digits from the [source][address] field.

My question is if I could put a filter after the output or if i can put something similar as :sql_last_value of jdbc plugin or if there is a more efficient method to do this.

Thank you in advance!

Tomo_M · February 3, 2022, 10:09pm

Is there any reason you need such two staged pipeline? Filter plugins between HTTP input and some output seems enough to extract some words or digits from some field.

rodri.gz · February 3, 2022, 10:24pm

Because i dont know how to tell logstash to use the filebeat-apache-access pipeline before the indexing. So i need a second pipeline to parse all the fields that i want with the data previously indexed at Elasticsearch

Can i download the pipeline in a correct format to paste it in the logstash config pipeline? Can i put a filter after an output? How can i use the module pipeline and extract all the info that i want in the same pipeline?

Tomo_M · February 3, 2022, 10:40pm

So you are using probably as the first pipeline.

Sorry I'm not familiar with filebeat and don't understand filebeat-apache-access pipeline. But I suppose there are some similar logstash input plugin to access your HTTP server.

Or using logstash beat input could be more simple.

What did you mean "some small differences"?

rodri.gz · February 3, 2022, 11:12pm

Yes im using filebeat input via the 5044 logstash port and at the output i set the pipeline parameter to use the pipeline preconfigured by the elastic team at the first pipeline.
When this parsed fields are indexed to elastic i use another pipeline with Elasticsearch input to keep this data and parse it again.

Yes , there is small differences. For example in my first index i could have a count 101 of status code = 200 and at the new one the count is 97.. is an example.

I tried to correct that difference with the fingerprint plugin, creating an id for all the documents concatenating 3 fields( [event][created] , [source][address] and [original][url] ) and using it as the document_id => %{fingerprint_id} and it reduce the difference but not at all

Tomo_M · February 3, 2022, 11:59pm

Part of the reason is that you use 2 minutes span every 2 minutes. There could be some gaps and overlaps when some delay has occurred somewhere. Even if the pipeline runs strictly every 2 minutes, there are about up to one second delay for documents to be able to be searched in Elasticsearch after indexed, some documents should be dropped.

You have to run the 2 minutes span pipeline more frequently or prolonged span pipeline every 2 minutes. And deduplicate the documents by using fingerprint as you said.

Use update action reduce the indexing load of the Elasticsearch cluster.

rodri.gz · February 4, 2022, 8:29am

I have it set 2 by 2 because if the time increases I exceed the elastic limitation of 10,000 documents...

Tomo_M · February 4, 2022, 8:33am

Then do the pipeline more frequently, or connect filebeat direct to logstash and output to two indices from logstash.

rodri.gz · February 4, 2022, 8:43am

can not increase the 10.000 limit , can i ?

i dont know well how to do it because if i increase i excedeed the limit but if i decrease i think i will not get all the documents. Mi workflow is generating 5.000 documents per minute.
How can i set 1 pipeline to two index and set it to balance the outputs?

sorry for the inconvenience and thank you very much for the help

Tomo_M · February 4, 2022, 8:54am

It's a possible solution to raise the limit.

just use two output plugin as this example.

rodri.gz · February 4, 2022, 9:00am

and how can i increase?

i see that but i dont understad how to divide the data equitably in two indexs.

Now i am targeting 3 different data nodes .

Tomo_M · February 4, 2022, 9:04am

Have you searched by yourself?

rodri.gz · February 4, 2022, 9:06am

Yes, I have been trying things like the ones we mentioned for several days but there are always some limitations or problems, I will continue investigating, thanks for the help.

system · March 4, 2022, 9:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can not get input from elasticsearch Logstash	2	324	July 26, 2020
Elasticsearch input missing [@metadata][_index] Logstash	14	2360	April 15, 2020
While using elastic search as the input, not able to fetch data using the same index in elastic seach plugin for filter Logstash	3	293	December 27, 2018
Logstash elasticsearch input plugin Logstash	4	339	December 26, 2023
Elasticsearch Input Filter Returns %{message} Logstash	2	865	March 13, 2017

Logstash elasticsearch input error

Related topics