Semantic search engine on the top of ES ! (Any suggestions/comments)

Yongyao_Jiang · February 11, 2016, 5:05pm

Hey guys,

I am trying to build up a semantic search engine on the top of ES, and below is my idea. I was wondering if this makes sense to you, especially for the ranking part. Feel free to be critical.

Simply put, if someone inputs a keyword "sport", and I will traverse the ontology/graph that I already have to find related keywords such as "tennis" and "football" with different weights. Then I user "water" along with the related words to form a new query to ES. Once ES returns the results, I will add the weight into the relevance score and re-rank the result.

Thanks,
Cody

jprante · February 11, 2016, 7:49pm

You should index all related terms together with a term. Then you don't need to traverse and re-rank at search time, which is bad for performance, and does not scale.

I use this in my reference plugin https://github.com/jprante/elasticsearch-analysis-reference

Sivan_Sasidharan · May 3, 2016, 6:26am

Hi Cody,
Did you get some breakthrough on this ?

Yongyao_Jiang · May 3, 2016, 2:33pm

Yes, I have got some ideas through reading literature, and I am working on some of them.

If what you want is just synonym, you can just use the out-of-box function of ES or the plugin Jprante developed;
But if you need something more complex, like what I said in my question. You can
do latent semantic analysis (LSA) with your documents,
or use an existing ontology (wordNet for general purpose),
or build your own ontology in your own way (this is what I am working on, discover semantic relation using user behavior)

As you may know, semantic search is still under active research. There is no off-the-shelf tool you can use. Let me know if you have any idea.

Cody

Sriharsha_Pothukuchi · June 16, 2016, 6:38am

Hello
@jprante / @Yongyao_Jiang

I am also working on similar problem. For now I have only key-value pair as my ontology. I am thinking about REDIS or ES to index this semantic/synonymy data. Then for every search keyword, first query on this index to fetch all similar keywords then query on actual data index with weights in should clause.

What challenges do you foresee in my approach if you thought on these lines.

Thanks
Sri Harsha

jprante · June 16, 2016, 7:43am

@Sriharsha_Pothukuchi that is what my reference plugin is doing: fetching a list of variants and index them at index time together with the main form of a word.

I do not recommend query expansion at server side by a plugin. It will add a lot of load to ES. Query expansion would be better at client side. Note that a large number of should clauses leads to slow queries.

Yongyao_Jiang · June 16, 2016, 3:13pm

@Sriharsha_Pothukuchi

Hi Sri,

I actually ended up with pretty much the same approach as you do. What @jprante is a good approach, when your ontology/synonym is static.

But if your ontology keeps changes or growing (e.g. you are mining knowledge from massive user search behavior), query-time expansion probably is the right direction to go, otherwise you have to re-index everything each time your ontology grows. Also, index-time plugin usually assumes all of the associated words are the same. It becomes problematic when the similarity between A and B is somewhere in between, say 0.8.

Here is the project I am working on. Try searching for "ocean wind"
http://52.70.209.189:8080/ontology/index.html

Thanks,
Yongyao

jprante · June 16, 2016, 3:30pm

Yes, I assume re-indexing is cheap. A reference dictionary of ~10 millions docs with ~40 millions variant forms in the docs with daily changes can be indexed in ~10 minutes here.

sacherus · January 23, 2017, 9:16am

I can create conceptual search with LDA/LSA + cosine search and believe it should give better results than on synonyms ontology (especially when try to look for long document). Is it a way (presumably not) to apply this approach to ES?

Yongyao_Jiang · January 31, 2017, 4:13pm

@sacherus , I know someone is doing this for Solr, but it might be a bit hard to do it with ES. An alternative approach is to store the keyword similarities into ES after performing LDA/LSA. When a keyword A comes in, we first find the most related N keywords, and then use them to create a semantic boost query.

Topic		Replies	Views
Semantic search in Elasticsearch Elasticsearch	5	2528	July 5, 2017
Latent Semantic Indexing in Elastic search Elasticsearch	3	3125	August 4, 2017
How to apply relevance ranking to ES indices? Elasticsearch	9	395	July 15, 2019
Synonyms and semantic search Elasticsearch	3	368	July 1, 2023
Semantic search Elasticsearch	3	427	May 17, 2018

Semantic search engine on the top of ES ! (Any suggestions/comments)

Related topics