Phonetic analysis of an attachment


(chris123) #1

I need to provide a phonetic analysis for an attachment.
I have the elasticsearch-analysis-phonetic plugin and
elasticsearch-mapper-attachments plugin installed.
They work fine.

However, I need to find similar sounding words in the attachment.
Is there an easy way to do it or it will require changing the attachments
mapper?
If it requires the plugin change, are you (i.e. ElasticSearch team)
planning to do it?

Chris


(Shay Banon) #2

In the mapper plugin you can control the specific fields that end up being
indexed using the mappings. See in the README section here:
https://github.com/elasticsearch/elasticsearch-mapper-attachments. So, you
can configure your custom analyzer with the phonetic configurations, and
then set the mappings for it using that analyzer.

On Wed, Jun 20, 2012 at 5:36 PM, Chris123 kjezak@reflexion.net wrote:

I need to provide a phonetic analysis for an attachment.
I have the elasticsearch-analysis-phonetic plugin and
elasticsearch-mapper-attachments plugin installed.
They work fine.

However, I need to find similar sounding words in the attachment.
Is there an easy way to do it or it will require changing the attachments
mapper?
If it requires the plugin change, are you (i.e. ElasticSearch team)
planning to do it?

Chris


(chris123) #3

Thank you and it works great! I love it!

Here is my test (trying to contribute a little):

delete index

curl -X DELETE "192.168.2.128:9200/test"

create index

curl -XPUT 'http://192.168.2.128:9200/test/' -d '{
"settings" : {
"index": {
"analysis" : {
"analyzer" : {
"soundex_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase",
"soundex_filter"]
}
},
"filter" : {
"soundex_filter" : {
"type" : "phonetic",
"encoder" : "soundex",
"replace" : true
}
}
}
}
},
"mappings" : {
"email" : {
"_source" : { "enabled" : false },
"properties" : {
"emailbody" : { "type" : "attachment" },
"emailbodysoundex" : { "type" : "attachment",
"fields" : {
"emailbodysoundex" : {
"analyzer" : "soundex_analyzer" }
}
}
}
}
}
}'

add to index the following message: Email body index all the words

curl -XPOST 'http://192.168.2.128:9200/test/email/?pretty=true' -d '{
"emailbody": "RW1haWwgYm9keSBpbmRleCBhbGwgdGhlIHdvcmRzCg==",
"emailbodysoundex": "RW1haWwgYm9keSBpbmRleCBhbGwgdGhlIHdvcmRzCg=="
}
'

check if the soundex is stored

curl -XGET '192.168.2.128:9200/test/_search?pretty=true' -d '{
"query" : {
"matchAll" : {}
},
"facets" : {
"tag" : {
"terms" : {
"field" : "emailbodysoundex"
}
}
}
}'

search body soundex i.e. bady

curl -XGET '192.168.2.128:9200/test/_search?pretty=true' -d '{
"query" : {
"text" : { "emailbodysoundex" :
{ "query" : "bady" }
}
}
}'


(system) #4