We wanted to use the synonym concept of elasticsearch for our text search.
Problem illustration:
We are creating synonym analyzer during the index creation.
String synonymsPath = "./word_synonyms.txt";
Settings settings =
ImmutableSettings.settingsBuilder().loadFromSource(XContentFactory.jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("filter")
.startObject("word_filter_synonyms")
.field("type","synonym")
.field("synonyms_path",synonymsPath)
.field("ignore_case",true)
.field("expand",true)
.endObject()
.endObject()
.startObject("analyzer")
.startObject("word_synonym")
.field("tokenizer","whitespace")
.field("filter", new String[] {"lowercase", "word_filter_synonyms"})
.endObject()
.endObject()
.endObject()
.endObject().string()).build();
CreateIndexRequestBuilder createIndexReqBuilder =
client.admin().indices().prepareCreate(indexName).setSettings(settings);
createIndexReqBuilder.execute().actionGet();
//Adding index items
getEsClient().prepareIndex(indexName,
".percolator", word)
.setSource(XContentFactory.jsonBuilder()
.startObject()
.field("query",
QueryBuilders.matchPhraseQuery("content",
word).analyzer("word_synonym")) //
Registering the query
.endObject())
.setRefresh(true)
.execute().actionGet();
With above code snippet, we have indexed a word "webservice"
Now, we have used highlighter during search as well, which highlights all
the word and return the actual synonym.
DocBuilder docBuilder =
PercolateSourceBuilder.docBuilder().setDoc(XContentFactory.jsonBuilder()
.startObject().field("content", text).endObject());
PercolateResponse response = client.preparePercolate()
.setIndices(indexName).setDocumentType("type")
.setSize(100)
.setPercolateDoc(docBuilder)
.setHighlightBuilder(new HighlightBuilder().field("content",0,0))
.setScore(true)
.execute().actionGet();
List matchedKeyords = new ArrayList();
for(PercolateResponse.Match match : response) {
Iterator iter = match.getHighlightFields().entrySet().iterator();
}
Our synonym text file formats we tried are
webservice webservices web-service
webservice, webservices, web-service
*Now when you search for text like "Synonym testing webservices web-service
and web service". results are not correct and not consistent. *
It randomly highlights web-service and web service. But not webservices.
Sometimes it done only with webservices...
*We are very much confused on the behavior. What we have felt is that it
might work okey when there are no spaces in your synonym and index items. *
*Please suggest on how to proceed. *
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/32a0cc64-2b72-49c1-b13f-9060804e0879%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.