MoreLikeThis and array field

Hi,

We have a document like this in ES:
{ "ID":"1111", "Text":"This my text",
"Concept":["concept1","concept2","concept3"] }

We are trying to detect duplicate based on the field "concept", with a
pattern like: if at least 85% of concept are the same, then it's a
duplicate.

But when using the following query, results are not good;

$ curl -XGET '
http://localhost:9200/twitter/tweet/1/_mlt?mlt_fields=concept&min_doc_freq=1'
We had to change the json format for the MLT to work like expected. (string
instead of array)
{ "ID":"1111", "Text":"This my text", "Concept":"concept1 concept2
concept3" }

Are we missing something ? Is this a bug or a design limitation ?

Thanks,

Gildas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Up,

We really like the fact of storing array and not string.
Any idea someone ?

Gildas

On Friday, April 26, 2013 11:23:55 AM UTC-4, Gildas Houmard wrote:

Hi,

We have a document like this in ES:
{ "ID":"1111", "Text":"This my text",
"Concept":["concept1","concept2","concept3"] }

We are trying to detect duplicate based on the field "concept", with a
pattern like: if at least 85% of concept are the same, then it's a
duplicate.

But when using the following query, results are not good;

$ curl -XGET '
http://localhost:9200/twitter/tweet/1/_mlt?mlt_fields=concept&min_doc_freq=1'
We had to change the json format for the MLT to work like expected.
(string instead of array)
{ "ID":"1111", "Text":"This my text", "Concept":"concept1 concept2
concept3" }

Are we missing something ? Is this a bug or a design limitation ?

Thanks,

Gildas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On 29/04/13 13:57, Gildas Houmard wrote:

Up,

We really like the fact of storing array and not string.
Any idea someone ?

What's your mapping look like? I've seen this happen when the field type
is inferred. I had to manually set the mapping of that type to be an
array type, and then all was well.

--
Cheers,
James Harrison

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.