I have a task. Given document I have to find the set of similar
documents.
The document has title and content fields. I'd like to use some kind
of cosine similarity.
My current approach is to represent input document as boolean query
constructed for each term with OR conjunction.
So there are some shortcomings
query is too large (thus I expect reduce in performance)
I have to use query_string query type so I need to use my own query
parser (I merge all terms from all the fields in one query and boost
the terms that belongs to the title)
My questions are what is the best way to solve this task? Is the
elastic search/lucene good for that kind of searching?
I would be much obliged,
-Kostya
ps what is "MoreLikeThis" function? Are there any description how it
works?
I have a task. Given document I have to find the set of similar
documents.
The document has title and content fields. I'd like to use some kind
of cosine similarity.
My current approach is to represent input document as boolean query
constructed for each term with OR conjunction.
So there are some shortcomings
query is too large (thus I expect reduce in performance)
I have to use query_string query type so I need to use my own query
parser (I merge all terms from all the fields in one query and boost
the terms that belongs to the title)
My questions are what is the best way to solve this task? Is the
Elasticsearch/lucene good for that kind of searching?
I would be much obliged,
-Kostya
ps what is "MoreLikeThis" function? Are there any description how it
works?
what about thishttp://www.elasticsearch.org/guide/reference/api/more-like-this.html
and thishttp://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html
?
I have a task. Given document I have to find the set of similar
documents.
The document has title and content fields. I'd like to use some kind
of cosine similarity.
My current approach is to represent input document as boolean query
constructed for each term with OR conjunction.
So there are some shortcomings
query is too large (thus I expect reduce in performance)
I have to use query_string query type so I need to use my own query
parser (I merge all terms from all the fields in one query and boost
the terms that belongs to the title)
My questions are what is the best way to solve this task? Is the
Elasticsearch/lucene good for that kind of searching?
I would be much obliged,
-Kostya
ps what is "MoreLikeThis" function? Are there any description how it
works?
what about thishttp://www.elasticsearch.org/guide/reference/api/more-like-this.html
and thishttp://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html
?
I have a task. Given document I have to find the set of similar
documents.
The document has title and content fields. I'd like to use some kind
of cosine similarity.
My current approach is to represent input document as boolean query
constructed for each term with OR conjunction.
So there are some shortcomings
query is too large (thus I expect reduce in performance)
I have to use query_string query type so I need to use my own query
parser (I merge all terms from all the fields in one query and boost
the terms that belongs to the title)
My questions are what is the best way to solve this task? Is the
Elasticsearch/lucene good for that kind of searching?
I would be much obliged,
-Kostya
ps what is "MoreLikeThis" function? Are there any description how it
works?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.