Comparing 2 indices or 2 set of docs


(Noel) #1

Continuing the discussion from Comparing 2 sources of log input ( using fuzzy? hash? term ?):

So, I am trying to get a range of messages using timestamps from 2 different data source and compare them see if they match (say match on particular timestamp & messageid combination)

Should I do comparison on multiple message using the following?

  1. they are 2 different indices
    Or
  2. create 2 different docs (doc_a and doc_b) under the same index?

And how to do comparison between messages (timeA-timeB) from source 1 and source 2. I look into mlt, it doesn't seem to be for comparing 2 arrays of messages.


(Noel) #2

Any suggestions?


(Mark Walkom) #3

You can't do a join type query in ES, so you'd either need to extract the docs and then compare them externally, or index them into the same index.


(Noel) #4

Ok. Say I index them into one index and give them different doc names (doc_a stuffs, and doc_b stuffs) how do I do a comparison?

Thanks


(Mark Walkom) #5

What do you want to compare exactly?


(Noel) #6

I would like to compare 2 fields from the docs. It seems aggregations would be a good option. For example,

  • Each message has field seqnumber and field timestamp. Each doc has multiple messages.
  • I put both doc_a and doc_b stuffs in index.
  • Using aggs, term field "seqnumber" , it should put seqnumber as bucket key and doc_count.
  1. I tried, and not able to get the doc count as 2 when I put 2 identical set of message in the two docs??
    field : seqnumber should show the total of doc_a.seqnumber and doc_b.seqnumber
    But my result buckets did not have the correct number of total.

  2. Another question is I don't know how to get a combination of seqnumber field and timestamp field as aggs buckets.

Thanks


(system) #7