HOWTO : Test new inserted docs against a list of 14000 words


(tybreizh29) #1

Hello
I have to insert xml docs (time coded word, converted into a mapping for each word) into Elastic Search.
I have to check if some words of a 14.000 words list are present into the newly inserted doc.
The 14.000 words are upper case, as my words are upper or/and lower.
As it's in French we have words with accents, like "é" the uppercase is E not É.
How can/should I do that ?
Here is how part of a record looks like:

,"wordlist":[{"content":"Hello","conf":"0.997","stime":"0.000","dur":"0.190"},{"content":"world","conf":"1.000","stime":"0.190","dur":"0.210"},

I can change my mapping to add a plain text field.

My first guess was to add a plain text with only upper case characters, and do 14.000 requests, that's ugly but it's me only idea.
thanks for your help
marc


(Isabel Drost-Fromm) #2

What exactly is your purpose with doing that? The approach you want to implement might depend on your use-case:

If you want to trigger some action if any of these words appears in a newly indexed document the Percolator might be what you want to use:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html

Others may know better solutions.

Regards,
Isabel


(system) #3