ElasticSearch and duplicate content

(Joffrey Hercule) #1

Hi all,
i'm an elasticsearch's newbie.

I've a little problem about duplicate content in order to guess if an item
must be inserted or updated.


  • A car contains a brand, a model, a color, ... (max 10 criterias)
  • in order to know if a record exists into ES, we use an algorithm with
    points system
  • so if the brand exists, we count 1 point. If the brand and the model
    exist, we count 5 points, etc.
  • after the search, we do a sum. If the total is high, we need to merge the
    record, otherwise, we create it.

I tried a method with bool/should match and top score but it took very long
time (more than 2 seconds) to retrieved the datas for 8 bool term.

Do you have a better idea about my problem ? Thanks in advance for your
help and sorry for my bad english !

