Ignore Hate temrs

hemantsingal · March 14, 2013, 7:22am

Is there a way to Ignore documents containing hate terms like fagt,
Nier etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · March 14, 2013, 7:41am

Hey,

check

You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird hemants.iitk@gmail.com wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Nier etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hemantsingal · March 14, 2013, 7:51am

Not indexing them is good but I really can't show these documents so I
still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:

Hey,

check
Elasticsearch Platform — Find real-time answers at scale | Elastic
You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird <hemant...@gmail.com<javascript:>

wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Nier etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_G_Ortega · March 14, 2013, 10:05am

Hi, would you like to post it in StackOverflow? Personally I would prefer
to answer there

On Thursday, March 14, 2013 8:51:13 AM UTC+1, cavebird wrote:

Not indexing them is good but I really can't show these documents so I
still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:

Hey,

check
Elasticsearch Platform — Find real-time answers at scale | Elastic
You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird hemant...@gmail.com wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Nier etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vineeth_mohan · March 14, 2013, 10:22am

Why dont you make sure that such documents are not indexed. Or at-least
periodically run a delete by query on all the documents which needs to be
black listed.

Thanks
Vineeth

On Thu, Mar 14, 2013 at 3:35 PM, David G Ortega g.ortega.david@gmail.comwrote:

Hi, would you like to post it in StackOverflow? Personally I would prefer
to answer there

On Thursday, March 14, 2013 8:51:13 AM UTC+1, cavebird wrote:

Not indexing them is good but I really can't show these documents so I
still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:

Hey,

check http://www.**Elasticsearch Platform — Find real-time answers at scale | Elasticreference/index-modules/
analysis/stop-tokenfilter.htmlhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/stop-tokenfilter.html
You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird hemant...@gmail.com wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Nier etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.**com.
For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hemantsingal · March 14, 2013, 10:26am

Why dont you make sure that such documents are not indexed.

How do I do that?

Or at-least periodically run a delete by query on all the documents which
needs to be black listed.

Deletes are expensive and not real time.

On Thursday, March 14, 2013 3:52:43 PM UTC+5:30, Vineeth Mohan wrote:

Why dont you make sure that such documents are not indexed. Or at-least
periodically run a delete by query on all the documents which needs to be
black listed.

Thanks
Vineeth

On Thu, Mar 14, 2013 at 3:35 PM, David G Ortega <g.orteg...@gmail.com<javascript:>

wrote:

Hi, would you like to post it in StackOverflow? Personally I would prefer
to answer there

On Thursday, March 14, 2013 8:51:13 AM UTC+1, cavebird wrote:

Not indexing them is good but I really can't show these documents so I
still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:

Hey,

check http://www.**Elasticsearch Platform — Find real-time answers at scale | Elasticreference/index-modules/
analysis/stop-tokenfilter.htmlhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/stop-tokenfilter.html
You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird hemant...@gmail.com wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Nier etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.**com.
For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 14, 2013, 11:29am

On Thu, 2013-03-14 at 03:26 -0700, cavebird wrote:

Why dont you make sure that such documents are not indexed.

How do I do that?

Check them in your application before you index them

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hemantsingal · March 14, 2013, 11:39am

Well the data comes into system from multiple entry points and through
different stacks as well (RoR, Java).
Also, I would like the ability to modify terms list in the future.

On Thursday, March 14, 2013 4:59:22 PM UTC+5:30, Clinton Gormley wrote:

On Thu, 2013-03-14 at 03:26 -0700, cavebird wrote:

Why dont you make sure that such documents are not indexed.

How do I do that?

Check them in your application before you index them

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_G_Ortega · March 14, 2013, 11:47am

Just in case you wanna have a 100% ES solution or just in case you want to
have all your data available
here you have a possible solution:

Create an analyzer that transforms all your words into the same token

{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["badword1=>bad", "badword2=>bad"]
}
},
"analyzer" : {
"isBad" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "asciifolding", "unique"]
"char_filter" : ["my_mapping"]
},
}
}
}
}

Set your field or fields as multi_field setting isBad with your isBad
analyzer

"myTextField" :
{
"type" : "multi_field",
"fields" :
{
"myTextField" : { "type" : "string"},
"isBad" : { "type" : "string", "index_analyzer" : "isBad"},
}
}

Search filtering

{
"from" : 0,
"size" : 10,
"query" :
{
"filtered" :
{
"query" :
{
"match_all" : { }
},
"filter" :
{
"not" : { {"term" : { "myTextField.isBad" : "bad" } } }
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hemantsingal · March 14, 2013, 11:52am

I will definitely try this and get back to you.

On Thursday, March 14, 2013 5:17:31 PM UTC+5:30, David G Ortega wrote:

Just in case you wanna have a 100% ES solution or just in case you want to
have all your data available
here you have a possible solution:

Create an analyzer that transforms all your words into the same token

{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["badword1=>bad", "badword2=>bad"]
}
},
"analyzer" : {
"isBad" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "asciifolding", "unique"]
"char_filter" : ["my_mapping"]
},
}
}
}
}

Set your field or fields as multi_field setting isBad with your isBad
analyzer

"myTextField" :
{
"type" : "multi_field",
"fields" :
{
"myTextField" : { "type" : "string"},
"isBad" : { "type" : "string", "index_analyzer" : "isBad"},
}
}

Search filtering

{
"from" : 0,
"size" : 10,
"query" :
{
"filtered" :
{
"query" :
{
"match_all" : { }
},
"filter" :
{
"not" : { {"term" : { "myTextField.isBad" : "bad" } } }
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vineeth_mohan · March 14, 2013, 12:36pm

I am not getting the whole concept here.
For this to work , shouldnt i write the text both to myTextField and isBad
field ?

Thanks
Vineeth

On Thu, Mar 14, 2013 at 5:22 PM, cavebird hemants.iitk@gmail.com wrote:

I will definitely try this and get back to you.

On Thursday, March 14, 2013 5:17:31 PM UTC+5:30, David G Ortega wrote:

Just in case you wanna have a 100% ES solution or just in case you want
to have all your data available
here you have a possible solution:

Create an analyzer that transforms all your words into the same token

{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["badword1=>bad", "badword2=>bad"]
}
},
"analyzer" : {
"isBad" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "asciifolding", "unique"]
"char_filter" : ["my_mapping"]
},
}
}
}
}

Set your field or fields as multi_field setting isBad with your isBad
analyzer

"myTextField" :
{
"type" : "multi_field",
"fields" :
{
"myTextField" : { "type" : "string"},
"isBad" : { "type" : "string", "index_analyzer" : "isBad"},
}
}

Search filtering

{
"from" : 0,
"size" : 10,
"query" :
{
"filtered" :
{
"query" :
{
"match_all" : { }
},
"filter" :
{
"not" : { {"term" : { "myTextField.isBad" : "bad" } } }
}
}
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_G_Ortega · March 14, 2013, 1:21pm

When you set a multi_field and you send a document with that field name ES
internally creates the multi_field using the mapping deffinition. What is
going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like
tagFlagged instead of "bad" like in the mapping so in another example with
this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in
text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Michael_Sick · March 14, 2013, 1:32pm

Also you should consider the impacts of false positives on your system.
Take the following phrase from The Hobbit - "The faggots are reeking".
Perhaps the elves are homophobic but research shows that they are just
admiring burning wood.

Since analysis for context and sentiment is difficult, you might setup a
system for review where the words that you are trying to exclude change a
state filed, something like: censorStatus=ok,review,notOk so that on most
reads you only retrieve the "ok" value and some stewards review the posts
that require it and either allow or disallow. Without knowing the context
of your system, not sure how likely it is that you need to care but if you
do you'll find that being "smart" about the exclusions can be a pain.

On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega g.ortega.david@gmail.comwrote:

When you set a multi_field and you send a document with that field name ES
internally creates the multi_field using the mapping deffinition. What is
going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like
tagFlagged instead of "bad" like in the mapping so in another example with
this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in
text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mrflip · March 14, 2013, 2:13pm

You may enjoy(?) the lists of obscene words I've gathered here: Our team - Website.Design

I believe this kind of thing -- for which regexps and lookup tables fall short, yet for which proper NLP is too much work -- is perfect for the percolate feature.

As you index each document, percolate against rule sets as complex or simple-term-matchy as you like, and tag documents with a "probably offensive" flag. Now exclude such altogether, or let visitors opt in/out to flagged documents.

Flip

Sent from my iPad

On Mar 14, 2013, at 9:32 AM, Michael Sick michael.sick@serenesoftware.com wrote:

Also you should consider the impacts of false positives on your system. Take the following phrase from The Hobbit - "The faggots are reeking". Perhaps the elves are homophobic but research shows that they are just admiring burning wood.

The Hobbit: The Enchanting Prelude to The Lord of the Rings - J.R.R. Tolkien - Google Books

Since analysis for context and sentiment is difficult, you might setup a system for review where the words that you are trying to exclude change a state filed, something like: censorStatus=ok,review,notOk so that on most reads you only retrieve the "ok" value and some stewards review the posts that require it and either allow or disallow. Without knowing the context of your system, not sure how likely it is that you need to care but if you do you'll find that being "smart" about the exclusions can be a pain.

On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega g.ortega.david@gmail.com wrote:

When you set a multi_field and you send a document with that field name ES internally creates the multi_field using the mapping deffinition. What is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like tagFlagged instead of "bad" like in the mapping so in another example with this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_G_Ortega · March 14, 2013, 2:23pm

I think that we both do the same since we both flag the doc but I love your
solution Flip. Brillaint.

On Thursday, March 14, 2013 3:13:23 PM UTC+1, Philip (Flip) Kromer wrote:

You may enjoy(?) the lists of obscene words I've gathered here:
Our team - Website.Design

I believe this kind of thing -- for which regexps and lookup tables fall
short, yet for which proper NLP is too much work -- is perfect for the
percolate feature.

As you index each document, percolate against rule sets as complex or
simple-term-matchy as you like, and tag documents with a "probably
offensive" flag. Now exclude such altogether, or let visitors opt in/out to
flagged documents.

Flip

Sent from my iPad

On Mar 14, 2013, at 9:32 AM, Michael Sick <michae...@serenesoftware.com<javascript:>>
wrote:

Also you should consider the impacts of false positives on your system.
Take the following phrase from The Hobbit - "The faggots are reeking".
Perhaps the elves are homophobic but research shows that they are just
admiring burning wood.

The Hobbit: The Enchanting Prelude to The Lord of the Rings - J.R.R. Tolkien - Google Books

Since analysis for context and sentiment is difficult, you might setup a
system for review where the words that you are trying to exclude change a
state filed, something like: censorStatus=ok,review,notOk so that on most
reads you only retrieve the "ok" value and some stewards review the posts
that require it and either allow or disallow. Without knowing the context
of your system, not sure how likely it is that you need to care but if you
do you'll find that being "smart" about the exclusions can be a pain.

On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega <g.orteg...@gmail.com<javascript:>

wrote:

When you set a multi_field and you send a document with that field name
ES internally creates the multi_field using the mapping deffinition. What
is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like
tagFlagged instead of "bad" like in the mapping so in another example with
this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in
text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vineeth_mohan · March 14, 2013, 3:38pm

Thanks David ,
This is a handy piece of knowledge.

Thanks
Vineeth

On Thu, Mar 14, 2013 at 7:53 PM, David G Ortega g.ortega.david@gmail.comwrote:

I think that we both do the same since we both flag the doc but I love
your solution Flip. Brillaint.

On Thursday, March 14, 2013 3:13:23 PM UTC+1, Philip (Flip) Kromer wrote:

You may enjoy(?) the lists of obscene words I've gathered here:
http://www.infochimps.com/datasets/list-of-dirty-
obscene-banned-and-otherwise-**unacceptable-wordshttp://www.infochimps.com/datasets/list-of-dirty-obscene-banned-and-otherwise-unacceptable-words

I believe this kind of thing -- for which regexps and lookup tables fall
short, yet for which proper NLP is too much work -- is perfect for the
percolate feature.

As you index each document, percolate against rule sets as complex or
simple-term-matchy as you like, and tag documents with a "probably
offensive" flag. Now exclude such altogether, or let visitors opt in/out to
flagged documents.

Flip

Sent from my iPad

On Mar 14, 2013, at 9:32 AM, Michael Sick <michae...@serenesoftware.**com>
wrote:

Also you should consider the impacts of false positives on your system.
Take the following phrase from The Hobbit - "The faggots are reeking".
Perhaps the elves are homophobic but research shows that they are just
admiring burning wood.

Google Books**
PA48#v=onepage&q&f=falsehttp://books.google.com/books?id=hFfhrCWiLSMC&pg=PA48&lpg=PA48#v=onepage&q&f=false

Since analysis for context and sentiment is difficult, you might setup a
system for review where the words that you are trying to exclude change a
state filed, something like: censorStatus=ok,review,notOk so that on most
reads you only retrieve the "ok" value and some stewards review the posts
that require it and either allow or disallow. Without knowing the context
of your system, not sure how likely it is that you need to care but if you
do you'll find that being "smart" about the exclusions can be a pain.

On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega g.orteg...@gmail.comwrote:

When you set a multi_field and you send a document with that field name
ES internally creates the multi_field using the mapping deffinition. What
is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like
tagFlagged instead of "bad" like in the mapping so in another example with
this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in
text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_G_Ortega · March 14, 2013, 11:59pm

You are welcome Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Zachariah · March 20, 2013, 12:36am

Hi David,

Thanks for the useful information. For the censored words, if I have a huge lists of offensive words, do I need to list all of them in the "term" :{"badw1"=>"bad", "badw2"=>"bad", ..., "badw9999"=>"bad"}, or is there another way doing this tedious task - as a file?

Second what is the performance of percolate? Is it acceptable to use it as sentiment analysis?

Thanks,

David

Topic		Replies	Views
Ignore small words in queries Elasticsearch	9	4618	July 6, 2017
What is the best practice around filtering out search results with curse words Elasticsearch	1	1292	July 6, 2017
Negative query : all the documents that DONT have the word "cat' in the title? Elasticsearch	2	352	July 6, 2017
Looking stemmer for tenses Elasticsearch	3	333	July 6, 2017
Documents on Elastic search and Kibana Elasticsearch	2	324	July 6, 2017

Ignore Hate temrs

Related topics