I would like to update a document that contains a nested object. I've already seen that sounds possible using script. However, it appears that we can't use params in Logstash ...
In the case that there is no response, can someone told me what is the best way to store an array with columns in Elasticsearch using Logstash if we recover if entry of an array one by one ?
Because I used arrays in my project, I thought about using nested objects with a script.
However, it seems that nested objects are not well supported by Kibana so I change my conception.
So now if I have an "Array1" type in my index "index1", each entry of my "Array1" would be each entry of my array (input data).
Anyway, I found that you can update an Elasticsearch data in Logstash using his "_id" (that you can customize if you want to retrieve some data !). Be careful about using a unique "_id" per data.
So, you can imagine to have an "Array1" type in your index "index1" with three columns "C1", "C2" and "C3".
First, you put your new data (with an id that you build an can retrieve easily), then you can use the "update" function using the id that you build to update your array if some new data are coming in.
Well, if you can change the format of your data, it could be a great idea to send all your data once
thanks for your quickly reply. sorry, my english is bad, so I'm not understand what did you mean exactly.
In my case one doc maybe updated 3 times with different new fileds or same fields. the updated same fields were in array, the new value is just append to the array. and so I must have a unique "id" per doc in es.
before I set the script type to file I using the dynamic script, so I can read the data by '%{filed}'. But it seems causes the memory leak. now I changed the script type to file, and I need to send the fileds by logstash's elasticsearch output as params to the script. so I found the discuss, but it seems that there is no way to do it.
I can show you my script code(it's just the one of my three udpate scripts.):
My data is about the product and user's behaviour data. Per doc in es contains five part data.
user request data,
product exposure data.
user click prodcut data.
add to cart data.
user place an order data.
so the doc id in es is eventId+productId. when one part of the datas created in es. the others will be updated to es by doc id.
I can show you the logstash's conf related to the script above:
My production is still using dynamic script ,and I got the out of memory of permgen twice. I think the dynamic script is the cause. so I want to change the script type to file.
you can see my permgen and heap size is always growing.
I think that you can modify your Logstash configuration by specifying "codec => json" in kafka and so change your json filter.
I don't understand what your script is doing here, Well, if you want to modify some data, you should change them in the filter part and not in your script but perhaps I didn't understand what you want to do ?
About the clone, I think you are right and I will have a try.
I used the script to do something like auto increment or modify the exist field in es. such as increment a integer field or append a value from the new event to an array field.
Otherwise, I just can coding java and python. maybe I should to study the ruby.
@CharlesLdy, @LetMeR00t, when using script for updating documents with logstash, you can use variable 'event' within your script which contains the logstash's event itself !
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.