Basically I have thousands of embedded devices that periodically upload logfiles to a backend we've created and after some preprocessing the log files are ingested into elasticserach using the geoip igestion pipeline. At the time of ingestion, the IP address the device has connected from has been injected into every log line and thus we get the geoip fields added to every document together with a bunch of other information that is related to the log file, such as device serial number, session id etc. These fields are allways present in every document and within a log file (= same session-id), these fields are identical between documents (serial number will not change during a session, session-GUID is the same for all documents belonging to a particular log file etc.).
Now, I want to do a transform to compute some more complicated metrics from each log session and I run into trouble...
I figured out a (crude) way to add the fields that are identical and always present within a log file (all documents in a log file has the same session-guid). I can start grouping by session-id, and then I can group-by serial number etc. because they will not change within a session-id (so in reality I just group by the session id, and then the fields for serial number and what not is added to the transform output).
Now, with geoip, it gets trickier... Since all geoip fields are not allways present after ingesting the document with the geoip ingestion pipeline (for example region, city etc. may not be added if the device has connected from a remote area) trying to group_by every geoip-field will end up in just getting a fraction of the log files transformed. If the group_by field is missing, the transform will just disregard all the documents in that group and no aggregations will be performed and I end up with just a fraction of the documents beeing transformed (the ones that have a complete set geoip fields).
So, how can I add geoip fields and values that is present and identical in every document that has been "grouped_by" to the result of my transformation? It may be that I am missing something obvious here (I'm a newbie on elastic), but I really need some help on how this is best done, how I can add specific fields and values from a document to the resulting document of the transform?
Hope this makes sense -otherwise, ask and I'll try to clarify...
) , the first hurdle I've tried to overcome this evening is how a map_script that copies the session field and subfields into the state would look like? I know what I want to do in pseudo-code, but the step to go from there to painless is a bit to big of a step for me to pull off right now. Do you have any better examples than the ones provided in the transforms documentation? Or, can you point me in the right direction with a code snippet that do something similar? I can't muster up the courage to out my utter lack of painless scripting skills by providing my redicilously naive attempts at doing this right now, but if I can get a starting point I can probably figure this out, I hope...