ghoumard  
                (Gildas Houmard)
               
                 
              
                  
                    April 26, 2013,  3:23pm
                   
                   
              1 
               
             
            
              Hi,
We have a document like this in ES: 
{ "ID":"1111", "Text":"This my text", 
"Concept":["concept1","concept2","concept3"] }
We are trying to detect duplicate based on the field "concept", with a 
pattern like: if at least 85% of concept are the same, then it's a 
duplicate.
But when using the following query, results are not good;
$ curl -XGET ' 
http://localhost:9200/twitter/tweet/1/_mlt?mlt_fields=concept&min_doc_freq=1 ' 
We had to change the json format for the MLT to work like expected. (string 
instead of array) 
{ "ID":"1111", "Text":"This my text", "Concept":"concept1 concept2 
concept3" }
Are we missing something ? Is this a bug or a design limitation ?
Thanks,
Gildas
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
              
                ghoumard  
                (Gildas Houmard)
               
              
                  
                    April 29, 2013, 12:57pm
                   
                   
              2 
               
             
            
              Up,
We really like the fact of storing array and not string. 
Any idea someone ?
Gildas
On Friday, April 26, 2013 11:23:55 AM UTC-4, Gildas Houmard wrote:
Hi,
We have a document like this in ES: 
{ "ID":"1111", "Text":"This my text", 
"Concept":["concept1","concept2","concept3"] }
We are trying to detect duplicate based on the field "concept", with a 
pattern like: if at least 85% of concept are the same, then it's a 
duplicate.
But when using the following query, results are not good;
$ curl -XGET ' 
http://localhost:9200/twitter/tweet/1/_mlt?mlt_fields=concept&min_doc_freq=1 ' 
We had to change the json format for the MLT to work like expected. 
(string instead of array) 
{ "ID":"1111", "Text":"This my text", "Concept":"concept1 concept2 
concept3" }
Are we missing something ? Is this a bug or a design limitation ?
Thanks,
Gildas
 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              On 29/04/13 13:57, Gildas Houmard wrote:
Up,
We really like the fact of storing array and not string. 
Any idea someone ?
 
What's your mapping look like? I've seen this happen when the field type 
is inferred. I had to manually set the mapping of that type to be an 
array type, and then all was well.
-- 
Cheers, 
James Harrison
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .