Ignore Hate temrs

Is there a way to Ignore documents containing hate terms like fagt,
Ni
er etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

check
http://www.elasticsearch.org/guide/reference/index-modules/analysis/stop-tokenfilter.html
You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird hemants.iitk@gmail.com wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Ni
er etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Not indexing them is good but I really can't show these documents so I
still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:

Hey,

check
http://www.elasticsearch.org/guide/reference/index-modules/analysis/stop-tokenfilter.html
You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird <hemant...@gmail.com<javascript:>

wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Ni
er etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi, would you like to post it in StackOverflow? Personally I would prefer
to answer there

On Thursday, March 14, 2013 8:51:13 AM UTC+1, cavebird wrote:

Not indexing them is good but I really can't show these documents so I
still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:

Hey,

check
http://www.elasticsearch.org/guide/reference/index-modules/analysis/stop-tokenfilter.html
You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird hemant...@gmail.com wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Ni
er etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Why dont you make sure that such documents are not indexed. Or at-least
periodically run a delete by query on all the documents which needs to be
black listed.

Thanks
Vineeth

On Thu, Mar 14, 2013 at 3:35 PM, David G Ortega g.ortega.david@gmail.comwrote:

Hi, would you like to post it in StackOverflow? Personally I would prefer
to answer there

On Thursday, March 14, 2013 8:51:13 AM UTC+1, cavebird wrote:

Not indexing them is good but I really can't show these documents so I
still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:

Hey,

check http://www.**elasticsearch.org/guide/reference/index-modules/
analysis/stop-tokenfilter.htmlhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/stop-tokenfilter.html
You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird hemant...@gmail.com wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Ni
er etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.**com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Why dont you make sure that such documents are not indexed.

  • How do I do that?

Or at-least periodically run a delete by query on all the documents which
needs to be black listed.

  • Deletes are expensive and not real time.

On Thursday, March 14, 2013 3:52:43 PM UTC+5:30, Vineeth Mohan wrote:

Why dont you make sure that such documents are not indexed. Or at-least
periodically run a delete by query on all the documents which needs to be
black listed.

Thanks
Vineeth

On Thu, Mar 14, 2013 at 3:35 PM, David G Ortega <g.orteg...@gmail.com<javascript:>

wrote:

Hi, would you like to post it in StackOverflow? Personally I would prefer
to answer there

On Thursday, March 14, 2013 8:51:13 AM UTC+1, cavebird wrote:

Not indexing them is good but I really can't show these documents so I
still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:

Hey,

check http://www.**elasticsearch.org/guide/reference/index-modules/
analysis/stop-tokenfilter.htmlhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/stop-tokenfilter.html
You can ignore any words at index time (note: they will still be in the
source of your document, if a search hit is returned).

On Thu, Mar 14, 2013 at 8:22 AM, cavebird hemant...@gmail.com wrote:

Is there a way to Ignore documents containing hate terms like fagt,
Ni
er etc from output of my search without having to specify them in each
and every query.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.**com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, 2013-03-14 at 03:26 -0700, cavebird wrote:

Why dont you make sure that such documents are not indexed.

  • How do I do that?

Check them in your application before you index them

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Well the data comes into system from multiple entry points and through
different stacks as well (RoR, Java).
Also, I would like the ability to modify terms list in the future.

On Thursday, March 14, 2013 4:59:22 PM UTC+5:30, Clinton Gormley wrote:

On Thu, 2013-03-14 at 03:26 -0700, cavebird wrote:

Why dont you make sure that such documents are not indexed.

  • How do I do that?

Check them in your application before you index them

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Just in case you wanna have a 100% ES solution or just in case you want to
have all your data available
here you have a possible solution:

  1. Create an analyzer that transforms all your words into the same token

{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["badword1=>bad", "badword2=>bad"]
}
},
"analyzer" : {
"isBad" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "asciifolding", "unique"]
"char_filter" : ["my_mapping"]
},
}
}
}
}

  1. Set your field or fields as multi_field setting isBad with your isBad
    analyzer

"myTextField" :
{
"type" : "multi_field",
"fields" :
{
"myTextField" : { "type" : "string"},
"isBad" : { "type" : "string", "index_analyzer" : "isBad"},
}
}

  1. Search filtering

{
"from" : 0,
"size" : 10,
"query" :
{
"filtered" :
{
"query" :
{
"match_all" : { }
},
"filter" :
{
"not" : { {"term" : { "myTextField.isBad" : "bad" } } }
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I will definitely try this and get back to you. :slight_smile:

On Thursday, March 14, 2013 5:17:31 PM UTC+5:30, David G Ortega wrote:

Just in case you wanna have a 100% ES solution or just in case you want to
have all your data available
here you have a possible solution:

  1. Create an analyzer that transforms all your words into the same token

{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["badword1=>bad", "badword2=>bad"]
}
},
"analyzer" : {
"isBad" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "asciifolding", "unique"]
"char_filter" : ["my_mapping"]
},
}
}
}
}

  1. Set your field or fields as multi_field setting isBad with your isBad
    analyzer

"myTextField" :
{
"type" : "multi_field",
"fields" :
{
"myTextField" : { "type" : "string"},
"isBad" : { "type" : "string", "index_analyzer" : "isBad"},
}
}

  1. Search filtering

{
"from" : 0,
"size" : 10,
"query" :
{
"filtered" :
{
"query" :
{
"match_all" : { }
},
"filter" :
{
"not" : { {"term" : { "myTextField.isBad" : "bad" } } }
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I am not getting the whole concept here.
For this to work , shouldnt i write the text both to myTextField and isBad
field ?

Thanks
Vineeth

On Thu, Mar 14, 2013 at 5:22 PM, cavebird hemants.iitk@gmail.com wrote:

I will definitely try this and get back to you. :slight_smile:

On Thursday, March 14, 2013 5:17:31 PM UTC+5:30, David G Ortega wrote:

Just in case you wanna have a 100% ES solution or just in case you want
to have all your data available
here you have a possible solution:

  1. Create an analyzer that transforms all your words into the same token

{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["badword1=>bad", "badword2=>bad"]
}
},
"analyzer" : {
"isBad" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "asciifolding", "unique"]
"char_filter" : ["my_mapping"]
},
}
}
}
}

  1. Set your field or fields as multi_field setting isBad with your isBad
    analyzer

"myTextField" :
{
"type" : "multi_field",
"fields" :
{
"myTextField" : { "type" : "string"},
"isBad" : { "type" : "string", "index_analyzer" : "isBad"},
}
}

  1. Search filtering

{
"from" : 0,
"size" : 10,
"query" :
{
"filtered" :
{
"query" :
{
"match_all" : { }
},
"filter" :
{
"not" : { {"term" : { "myTextField.isBad" : "bad" } } }
}
}
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

When you set a multi_field and you send a document with that field name ES
internally creates the multi_field using the mapping deffinition. What is
going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like
tagFlagged instead of "bad" like in the mapping so in another example with
this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in
text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Also you should consider the impacts of false positives on your system.
Take the following phrase from The Hobbit - "The faggots are reeking".
Perhaps the elves are homophobic but research shows that they are just
admiring burning wood.

http://books.google.com/books?id=hFfhrCWiLSMC&pg=PA48&lpg=PA48#v=onepage&q&f=false

Since analysis for context and sentiment is difficult, you might setup a
system for review where the words that you are trying to exclude change a
state filed, something like: censorStatus=ok,review,notOk so that on most
reads you only retrieve the "ok" value and some stewards review the posts
that require it and either allow or disallow. Without knowing the context
of your system, not sure how likely it is that you need to care but if you
do you'll find that being "smart" about the exclusions can be a pain.

On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega g.ortega.david@gmail.comwrote:

When you set a multi_field and you send a document with that field name ES
internally creates the multi_field using the mapping deffinition. What is
going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like
tagFlagged instead of "bad" like in the mapping so in another example with
this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in
text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You may enjoy(?) the lists of obscene words I've gathered here: http://www.infochimps.com/datasets/list-of-dirty-obscene-banned-and-otherwise-unacceptable-words

I believe this kind of thing -- for which regexps and lookup tables fall short, yet for which proper NLP is too much work -- is perfect for the percolate feature.

As you index each document, percolate against rule sets as complex or simple-term-matchy as you like, and tag documents with a "probably offensive" flag. Now exclude such altogether, or let visitors opt in/out to flagged documents.

Flip

Sent from my iPad

On Mar 14, 2013, at 9:32 AM, Michael Sick michael.sick@serenesoftware.com wrote:

Also you should consider the impacts of false positives on your system. Take the following phrase from The Hobbit - "The faggots are reeking". Perhaps the elves are homophobic but research shows that they are just admiring burning wood.

http://books.google.com/books?id=hFfhrCWiLSMC&pg=PA48&lpg=PA48#v=onepage&q&f=false

Since analysis for context and sentiment is difficult, you might setup a system for review where the words that you are trying to exclude change a state filed, something like: censorStatus=ok,review,notOk so that on most reads you only retrieve the "ok" value and some stewards review the posts that require it and either allow or disallow. Without knowing the context of your system, not sure how likely it is that you need to care but if you do you'll find that being "smart" about the exclusions can be a pain.

On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega g.ortega.david@gmail.com wrote:

When you set a multi_field and you send a document with that field name ES internally creates the multi_field using the mapping deffinition. What is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like tagFlagged instead of "bad" like in the mapping so in another example with this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think that we both do the same since we both flag the doc but I love your
solution Flip. Brillaint.

On Thursday, March 14, 2013 3:13:23 PM UTC+1, Philip (Flip) Kromer wrote:

You may enjoy(?) the lists of obscene words I've gathered here:
http://www.infochimps.com/datasets/list-of-dirty-obscene-banned-and-otherwise-unacceptable-words

I believe this kind of thing -- for which regexps and lookup tables fall
short, yet for which proper NLP is too much work -- is perfect for the
percolate feature.

As you index each document, percolate against rule sets as complex or
simple-term-matchy as you like, and tag documents with a "probably
offensive" flag. Now exclude such altogether, or let visitors opt in/out to
flagged documents.

Flip

Sent from my iPad

On Mar 14, 2013, at 9:32 AM, Michael Sick <michae...@serenesoftware.com<javascript:>>
wrote:

Also you should consider the impacts of false positives on your system.
Take the following phrase from The Hobbit - "The faggots are reeking".
Perhaps the elves are homophobic but research shows that they are just
admiring burning wood.

http://books.google.com/books?id=hFfhrCWiLSMC&pg=PA48&lpg=PA48#v=onepage&q&f=false

Since analysis for context and sentiment is difficult, you might setup a
system for review where the words that you are trying to exclude change a
state filed, something like: censorStatus=ok,review,notOk so that on most
reads you only retrieve the "ok" value and some stewards review the posts
that require it and either allow or disallow. Without knowing the context
of your system, not sure how likely it is that you need to care but if you
do you'll find that being "smart" about the exclusions can be a pain.

On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega <g.orteg...@gmail.com<javascript:>

wrote:

When you set a multi_field and you send a document with that field name
ES internally creates the multi_field using the mapping deffinition. What
is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like
tagFlagged instead of "bad" like in the mapping so in another example with
this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in
text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks David ,
This is a handy piece of knowledge.

Thanks
Vineeth

On Thu, Mar 14, 2013 at 7:53 PM, David G Ortega g.ortega.david@gmail.comwrote:

I think that we both do the same since we both flag the doc but I love
your solution Flip. Brillaint.

On Thursday, March 14, 2013 3:13:23 PM UTC+1, Philip (Flip) Kromer wrote:

You may enjoy(?) the lists of obscene words I've gathered here:
http://www.infochimps.com/datasets/list-of-dirty-
obscene-banned-and-otherwise-**unacceptable-wordshttp://www.infochimps.com/datasets/list-of-dirty-obscene-banned-and-otherwise-unacceptable-words

I believe this kind of thing -- for which regexps and lookup tables fall
short, yet for which proper NLP is too much work -- is perfect for the
percolate feature.

As you index each document, percolate against rule sets as complex or
simple-term-matchy as you like, and tag documents with a "probably
offensive" flag. Now exclude such altogether, or let visitors opt in/out to
flagged documents.

Flip

Sent from my iPad

On Mar 14, 2013, at 9:32 AM, Michael Sick <michae...@serenesoftware.**com>
wrote:

Also you should consider the impacts of false positives on your system.
Take the following phrase from The Hobbit - "The faggots are reeking".
Perhaps the elves are homophobic but research shows that they are just
admiring burning wood.

http://books.google.com/books?id=hFfhrCWiLSMC&pg=PA48&lpg=
PA48#v=onepage&q&f=falsehttp://books.google.com/books?id=hFfhrCWiLSMC&pg=PA48&lpg=PA48#v=onepage&q&f=false

Since analysis for context and sentiment is difficult, you might setup a
system for review where the words that you are trying to exclude change a
state filed, something like: censorStatus=ok,review,notOk so that on most
reads you only retrieve the "ok" value and some stewards review the posts
that require it and either allow or disallow. Without knowing the context
of your system, not sure how likely it is that you need to care but if you
do you'll find that being "smart" about the exclusions can be a pain.

On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega g.orteg...@gmail.comwrote:

When you set a multi_field and you send a document with that field name
ES internally creates the multi_field using the mapping deffinition. What
is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like
tagFlagged instead of "bad" like in the mapping so in another example with
this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3 text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in
text.isBad, no posts flagged are going to appear

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You are welcome Vineeth :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David,

Thanks for the useful information. For the censored words, if I have a huge lists of offensive words, do I need to list all of them in the "term" :{"badw1"=>"bad", "badw2"=>"bad", ..., "badw9999"=>"bad"}, or is there another way doing this tedious task - as a file?

Second what is the performance of percolate? Is it acceptable to use it as sentiment analysis?

Thanks,

David