Question about nested data

Im trying to read a 100k line file into elastic search with logstash.

I have a .txt file lines are like this userid;action.

I want to use my userids as uniq and import every matching action to it, but there are duplicated actions which i dont want..

Set the document_id based on the userid (either as-is, or using a fingerprint filter) and then duplicate actions for the same userid will be overwritten.

yes but i also want to keep new actions nested to userid too, with what you are saying it would overwrite action wouldnt it?

what i need is:

add
userid = 3
action = walk

if new data matches userid3 and the new action is uniq add it to action list etc
action = walk, talk

In that case use an aggregate filter to collect all the unique actions for a userid (using a Ruby set, perhaps). If the input is not sorted it would be like example 3, if it is sorted by userid then example 4. Note that this means logstash will be holding the entire input file in memory until the timeout triggers.

would using php with elasticsearch module to do this without using logstash would perform good?

I could not say.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.