Calling a python function in logstash


(Mahdy S ) #1

I'm using logstash to parse some files into elasticsearch. I would like to call a python function passing to it some arguments from logstash and then to add the value returned by the function in a new field in elasticsearch. Is there a possibility to do that?

Otherwise I would be happy if there are any other suggestions to realize my use case which is as follows: While parsing an input file with logstash's grok filter into elasticsearch, I would like to count the number of documents of the same index in elasticsearch whose field "x" value matches that of the document being parsed (the current line in the input file), divide that by the total number of documents in that index and add the result as a field to the event I'm parsing. For me the change in the number of events is important that's why I want to that while parsing (after each event) and not in a seperate script afterwards.


(Magnus B├Ąck) #2

I'm using logstash to parse some files into elasticsearch. I would like to call a python function passing to it some arguments from logstash and then to add the value returned by the function in a new field in elasticsearch. Is there a possibility to do that?

Not in an efficient way unless you're willing to make it very complicated. If what you want to do can be realized with Ruby you can use the ruby filter.

Otherwise I would be happy if there are any other suggestions to realize my use case which is as follows: While parsing an input file with logstash's grok filter into elasticsearch, I would like to count the number of documents of the same index in elasticsearch whose field "x" value matches that of the document being parsed (the current line in the input file), divide that by the total number of documents in that index and add the result as a field to the event I'm parsing. For me the change in the number of events is important that's why I want to that while parsing (after each event) and not in a seperate script afterwards.

Doing an Elasticsearch query for each event sounds extremely inefficient for non-trivial number of events. It seems like you're computing relative frequency of terms, and doing that while indexing seems both inefficient and incorrect, but I obviously don't know that background of this.

There's the elasticsearch filter but I don't think it can do what you need.


(Mahdy S ) #3

Yes I'm trying to compute the relative frequency of terms. However this should for each event consider only the events which existed previously. That's why I'm doing this at indexing time. Another reason is that I'm sending events continously so if I do the computation with a script I will have to run it each time I want to view this information in order to update the info for the new events.


(Christian Dahlqvist) #4

How are you looking to use these fields once they are indexed in Elasticsearch? As Elasticsearch is a search engine, it keeps track of term frequencies out of the box, so it may be that your requirement might be possible to address at query time using standard Elasticsearch features instead.


(Mahdy S ) #5

I'm visualizing the data in ES using Kibana if this is your question. Again what I need is to find the frequency of terms when the event is indexed which should stay constant for that particular event so this is different from what ES provides out of the box.

Let's say I'm documenting sold products via some e-commerce site and I started indexing yesterday, where I have a product ID field. The first time a customer buys the product with ID 00000001 the frequency of terms for this product ID is one. I want to take this frequency of terms and keep it constant by attaching it to this document (so the "new" frequency of terms, let's call it "static count" refers to a single document not to a product ID). Now if afterwards another customer buys the same product, then the "static count" field for the first document should stay 1, while for the new document (which describes the event of the 2nd customer) the "static count" becomes 2.


(Christian Dahlqvist) #6

OK, that explains it. Can however not really come up with any way to do that in an efficient way that scales...


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.