What is the best practice around filtering out search results with curse words

varun_kumar · April 28, 2015, 11:01pm

Hey All,
I want to filter out docs with hate words in my search result. Currently we
are having bool filter in every search query for the list of all words. And
this results in tons of slow queries, since the list of hate words is long
(So much of hatred around )

I was wondering what are the best practices for this spam/hate words
filtering.

Here are what we are considering:

Pre-process : Scan the doc prior to indexing and hence mark them bad or
do not index them.
Problem : The documents are indexed from several processes and it is
difficult to force the rule on any new component some one writes.
Creating a percolator and running it periodically (Not sure of the best
frequency and timing) to tag all documents with bad words as "badDoc" :
true. Hence have a filter in all the queries.
Problem : Not sure of the performance impact due to periodical running
of percolator, secondly the same problem of discipline in all queries to
exclude badDoc

Personally I would favor a pure ES solution and I am sure this is not a new
problem, and hence seeking expert guidance and best practices.
Any guidance/links would be helpful!

Thanks and Regards
Varun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e3915d1-4c51-4c00-aa57-516f52d7983f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Removing docs from search results Elasticsearch	3	320	July 6, 2017
Ignore Hate temrs Elasticsearch	18	1888	July 6, 2017
Dictionary of stop words with special behavior Elasticsearch	4	545	July 5, 2017
Using a percolator for a huge blacklist Elasticsearch	4	479	July 6, 2017
Percolation against same non-changed docs? Elasticsearch	1	301	July 6, 2017

What is the best practice around filtering out search results with curse words

Related topics