Hey All,
I want to filter out docs with hate words in my search result. Currently we
are having bool filter in every search query for the list of all words. And
this results in tons of slow queries, since the list of hate words is long
(So much of hatred around )
I was wondering what are the best practices for this spam/hate words
filtering.
Here are what we are considering:
-
Pre-process : Scan the doc prior to indexing and hence mark them bad or
do not index them.
Problem : The documents are indexed from several processes and it is
difficult to force the rule on any new component some one writes. -
Creating a percolator and running it periodically (Not sure of the best
frequency and timing) to tag all documents with bad words as "badDoc" :
true. Hence have a filter in all the queries.
Problem : Not sure of the performance impact due to periodical running
of percolator, secondly the same problem of discipline in all queries to
exclude badDoc
Personally I would favor a pure ES solution and I am sure this is not a new
problem, and hence seeking expert guidance and best practices.
Any guidance/links would be helpful!
Thanks and Regards
Varun
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e3915d1-4c51-4c00-aa57-516f52d7983f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.