Combine multiple sources


#1

I've logs from multiple sources for the same user. But I need to report fields of this differents sources. Is it possible to store all this data in the same document ?
Thanks


(Magnus Bäck) #2

Please see this topic:


#3

Thanks but I don't understand how to do the merging on the Elasticsearch side. Somebody could help me ?


(Magnus Bäck) #4

It'll be easier to help if you can ask a more specific question. What kind of sources do you have? What fields will each source contribute with?


#5

I've a xml file like this :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
< devices>
    < device id="413">
        < uuid>2ab941ea-fa62-477b-bfc7-e5a101ac2df7</uuid>
        < principal>152100</principal>
        ...
    < /device>
< /devices>

a csv file like this :
PRINCIPAL;FUNCTIONAL_EXPORT_DATE;LENGTH(FILE_CONTENT)
152100;01/07/2015;88576

and an other like this :
PRINCIPAL;CART_ID;UPDATE_DATE;TOTAL_AMOUNT
152100;10776692;03/08/2015;2

and I would like to join all of this on the field PRINCIPAL


(Magnus Bäck) #6

Argh, crap. Sorry. Elasticsearch won't actually merge the documents for you. In this case I don't have a simple solution for you.


#7

If someone meets the same difficulty I managed to use in logstash the filter plugin elasticsearch.


(Sushma Kalle) #8

Hi Hornov,

I need to do the same with two csv files, Can you give me a sample of your logstash config file, as how exactly did you use the elasticsearch plugin. I have been playing around the options in the plugin, couldn't find it

Thanks


#9

Not being a native English speaker, sorry in advance for English

With the elasticsearch plugin I can search the first document, my rule to find the original document is a litle bit more complicated than keysourcea=keysourceb but the idea is :
I store my file A then my file B.
The config file of the file b is something like :
elasticsearch {
hosts => ["myhost/myindex"]
query => "myquerrystring AND keysourcea:%{keysourceb}"
fields => ["MY_FIELD1","MY_FIELD1","MY_FIELD2","MY_FIELD2",...]
sort => "TIMESTAMP_EVT:desc"
}

but the best way if you can define a key is to put in your output elasticsearch :
action => "update"
doc_as_upsert => true
document_id => "%{yourkey}"
The doc_as_upsert is only useful if both files can come in the same time.


(Sushma Kalle) #10

Thank you so much for the reply.
But it didn't work for my case, I needed many to many relation to work.
Any how your reply did help in clarifying some doubts.
Thank again :slight_smile:


#11

If you've many to many relations the elasticsearch filter can't help you because, you can only find 1 other event.
I think the best way is to make the relation before if you can (repeat the second file (or a part) for every (or some) lines of the first).


(system) #12