#create a test index with shingle mapping
curl -XPUT localhost:9200/test -d '{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_shingle":{
"tokenizer":"standard",
"filter":["standard", "lowercase", "filter_stop", "filter_shingle"]
}
},
"filter":{
"filter_shingle":{
"type":"shingle",
"max_shingle_size":5,
"min_shingle_size":2,
"output_unigrams":"true"
},
"filter_stop":{
"type":"stop",
"stopwords":[
"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
"they", "this", "to", "will", "with"
]
}
}
}
}
},
"mappings":{
"product":{
"properties":{
"title":{
"search_analyzer":"analyzer_shingle",
"index_analyzer":"analyzer_shingle",
"type":"string"
}
}
}
}
}'
#Add some docs to the index
curl -XPOST localhost:9200/test/product/1 -d '{"title" : "EGFR"}'
curl -XPOST localhost:9200/test/product/1 -d '{"title" : "WAS"}'
curl -XPOST localhost:9200/test/product/2 -d '{"title" : "Lung Cancer"}'
curl -XPOST localhost:9200/test/product/3 -d '{"title" : "Lung"}'
curl -XPOST localhost:9200/test/product/3 -d '{"title" : "Cancer"}'
curl -XPOST localhost:9200/test/_refresh
#Analyze API to check out shingling
curl -XGET 'localhost:9200/test/_analyze?analyzer=analyzer_shingle&pretty' -d 'EGFR and WAS Lung Cancer' | grep token
#Sample search should return should return EGFR, Lung Cancer, Lung, Cancer
curl -XGET 'localhost:9200/test/product/_search?q=title:EGFR+Lung+Cancer&pretty'
#Sample search with stop word should return EGFR, WAS, Lung Cancer, Lung, Cancer
curl -XGET 'localhost:9200/test/product/_search?q=title:EGFR+and+WAS+Lung+Cancer&pretty'
#Sample search with seperating word should return EGFR, Lung Cancer, Lung, Cancer
curl -XGET 'localhost:9200/test/product/_search?q=title:EGFR+and+Lung+related+Cancer&pretty'
#Sample search with seperating word should return EGFR, Lung Cancer, Lung, Cancer
curl -XGET localhost:9200/test/product/_search?pretty -d '{
"query" : {
"match" : {
"title" : {
"query" : "EGFR and Lung related Cancer",
"analyzer":"standard"
}
}
}
}'
curl -X DELETE localhost:9200/test
On Wednesday, July 23, 2014 9:37:03 AM UTC-5, Nick Tackes wrote:
I have created a gist with an analyzer that uses filter shingle in
attempt to match sub phrases.
For instance I have entries in the table with discrete phrases like
EGFR
Lung Cancer
Lung
Cancer
and I want to match these when searching the phrase 'EGFR related lung
cancer
My expectation is that the multi word matches score higher than the single
matches, for instance...
- Lung Cancer
- Lung
- Cancer
- EGFR
Additionally, I tried a standard analyzer match but this didn't yield the
desired result either. One complicating aspect to this approach is that the
min_shingle_size has to be 2 or more.
How then would I be able to match single words like 'EGFR' or 'Lung'?
thanks
https://gist.github.com/nicktackes/ffdbf22aba393efc2169.js
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/471d07e5-fbb5-46d8-8e36-01c1a7eb4ec3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.