Extension to MLT


(Ori Lahav) #1

Hi
As far as I saw in the MLT (0.5) documentation, you can onlt query MLT
for a document that is already indexed.
We are looking for slightly different implementation where the input
document is a URL that the server extracts the most significant
keywords from and returns the similar docs.

any idea if ES support it?


(Shay Banon) #2

elasticsearch has the option the execute a moreLikeThis query, which is part
of the query dsl. When using the query, you just provide it with a text to
find docs that match it, so, in your case, fetch the doc, get the text from
it, and execute a moreLikeThis query.

-shay.banon

On Sun, Mar 28, 2010 at 11:09 AM, Ori Lahav olahav@gmail.com wrote:

Hi
As far as I saw in the MLT (0.5) documentation, you can onlt query MLT
for a document that is already indexed.
We are looking for slightly different implementation where the input
document is a URL that the server extracts the most significant
keywords from and returns the similar docs.

any idea if ES support it?


(Ori Lahav) #3

Got you.
So it is not as Solr where instead of text you can give it a URL to fetch
the text from.

On Sun, Mar 28, 2010 at 12:05 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

elasticsearch has the option the execute a moreLikeThis query, which is
part of the query dsl. When using the query, you just provide it with a text
to find docs that match it, so, in your case, fetch the doc, get the text
from it, and execute a moreLikeThis query.

-shay.banon

On Sun, Mar 28, 2010 at 11:09 AM, Ori Lahav olahav@gmail.com wrote:

Hi
As far as I saw in the MLT (0.5) documentation, you can onlt query MLT
for a document that is already indexed.
We are looking for slightly different implementation where the input
document is a URL that the server extracts the most significant
keywords from and returns the similar docs.

any idea if ES support it?

--


(Shay Banon) #4

I have no idea how Solr does it.

-shay.banon

On Sun, Mar 28, 2010 at 5:33 PM, Ori Lahav olahav@gmail.com wrote:

Got you.
So it is not as Solr where instead of text you can give it a URL to fetch
the text from.

On Sun, Mar 28, 2010 at 12:05 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

elasticsearch has the option the execute a moreLikeThis query, which is
part of the query dsl. When using the query, you just provide it with a text
to find docs that match it, so, in your case, fetch the doc, get the text
from it, and execute a moreLikeThis query.

-shay.banon

On Sun, Mar 28, 2010 at 11:09 AM, Ori Lahav olahav@gmail.com wrote:

Hi
As far as I saw in the MLT (0.5) documentation, you can onlt query MLT
for a document that is already indexed.
We are looking for slightly different implementation where the input
document is a URL that the server extracts the most significant
keywords from and returns the similar docs.

any idea if ES support it?

--
http://olahav.typepad.com


(Yatir Ben Shlomo-2) #5

just fyi,
In solr mlt you can supply a url as a paramter to the request, and
solr will access this url, interpret the response as the contents of a
document, tokenize it and extract the interesting words from it and
use it to perform an MLT query

On Mar 28, 7:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

I have no idea how Solr does it.

-shay.banon

On Sun, Mar 28, 2010 at 5:33 PM, Ori Lahav ola...@gmail.com wrote:

Got you.
So it is not as Solr where instead of text you can give it a URL to fetch
the text from.

On Sun, Mar 28, 2010 at 12:05 PM, Shay Banon <shay.ba...@elasticsearch.com

wrote:

elasticsearch has the option the execute a moreLikeThis query, which is
part of the query dsl. When using the query, you just provide it with a text
to find docs that match it, so, in your case, fetch the doc, get the text
from it, and execute a moreLikeThis query.

-shay.banon

On Sun, Mar 28, 2010 at 11:09 AM, Ori Lahav ola...@gmail.com wrote:

Hi
As far as I saw in the MLT (0.5) documentation, you can onlt query MLT
for a document that is already indexed.
We are looking for slightly different implementation where the input
document is a URL that the server extracts the most significant
keywords from and returns the similar docs.

any idea if ES support it?

--
http://olahav.typepad.com


(Shay Banon) #6

I understand what it means. If you want this feature, then open a feature
request for this. In general, I prefer not to rely on external resources in
elasticsearch, and if I do, it should be done correctly. Meaning, in this
case, to use async io to fetch the doc, and not block a thread on io
operation, which needs developing.

-shay.banon

On Mon, May 24, 2010 at 4:38 PM, Yatir Ben Shlomo maanit.arch@gmail.comwrote:

just fyi,
In solr mlt you can supply a url as a paramter to the request, and
solr will access this url, interpret the response as the contents of a
document, tokenize it and extract the interesting words from it and
use it to perform an MLT query

On Mar 28, 7:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

I have no idea how Solr does it.

-shay.banon

On Sun, Mar 28, 2010 at 5:33 PM, Ori Lahav ola...@gmail.com wrote:

Got you.
So it is not as Solr where instead of text you can give it a URL to
fetch

the text from.

On Sun, Mar 28, 2010 at 12:05 PM, Shay Banon <
shay.ba...@elasticsearch.com

wrote:

elasticsearch has the option the execute a moreLikeThis query, which
is

part of the query dsl. When using the query, you just provide it with
a text

to find docs that match it, so, in your case, fetch the doc, get the
text

from it, and execute a moreLikeThis query.

-shay.banon

On Sun, Mar 28, 2010 at 11:09 AM, Ori Lahav ola...@gmail.com wrote:

Hi
As far as I saw in the MLT (0.5) documentation, you can onlt query
MLT

for a document that is already indexed.
We are looking for slightly different implementation where the input
document is a URL that the server extracts the most significant
keywords from and returns the similar docs.

any idea if ES support it?

--
http://olahav.typepad.com


(Shay Banon) #7

By the way, if you want to provide the text for the search, then you
probably want to use the search API, with an mlt query. The mlt query allows
to provide the text to do mlt on. In this case, you fetch the text from the
url on the client side, and execute the search query with the text
populated. This makes more sense then fetching the text on each search shard
side.

cheers,
shay.banon

On Mon, May 24, 2010 at 9:32 PM, Shay Banon shay.banon@elasticsearch.comwrote:

I understand what it means. If you want this feature, then open a feature
request for this. In general, I prefer not to rely on external resources in
elasticsearch, and if I do, it should be done correctly. Meaning, in this
case, to use async io to fetch the doc, and not block a thread on io
operation, which needs developing.

-shay.banon

On Mon, May 24, 2010 at 4:38 PM, Yatir Ben Shlomo maanit.arch@gmail.comwrote:

just fyi,
In solr mlt you can supply a url as a paramter to the request, and
solr will access this url, interpret the response as the contents of a
document, tokenize it and extract the interesting words from it and
use it to perform an MLT query

On Mar 28, 7:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

I have no idea how Solr does it.

-shay.banon

On Sun, Mar 28, 2010 at 5:33 PM, Ori Lahav ola...@gmail.com wrote:

Got you.
So it is not as Solr where instead of text you can give it a URL to
fetch

the text from.

On Sun, Mar 28, 2010 at 12:05 PM, Shay Banon <
shay.ba...@elasticsearch.com

wrote:

elasticsearch has the option the execute a moreLikeThis query, which
is

part of the query dsl. When using the query, you just provide it with
a text

to find docs that match it, so, in your case, fetch the doc, get the
text

from it, and execute a moreLikeThis query.

-shay.banon

On Sun, Mar 28, 2010 at 11:09 AM, Ori Lahav ola...@gmail.com
wrote:

Hi
As far as I saw in the MLT (0.5) documentation, you can onlt query
MLT

for a document that is already indexed.
We are looking for slightly different implementation where the input
document is a URL that the server extracts the most significant
keywords from and returns the similar docs.

any idea if ES support it?

--
http://olahav.typepad.com


(system) #8