Highlights are truncated?


(jmurdoch) #1

It appears that the results returned from a search query are
truncating the contents of highlighted field. Am I doing something
incorrectly here? Notice how the 'S' of Susana is truncated off in
the highlight of the OWNERFULLN field.

Thanks in advance!

Query:

{
"query" : {
"query_string" : {
"query" : "Stump"
}
},
"fields" : ["*"],
"highlight" : {
"fields" : {
"docSource.feature.attributes.OWNERFULLN" : {
"fragment_size" : 100
}
}
}
}

Result:

{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 1.4052091,
"hits" : [ {
"_index" : "blah",
"type" : "blahblah",
"id" : "-1668604027
-2041742376__261713",
"_score" : 1.4052091,
"fields" : {
"docSource.feature.attributes.OWNERFULLN" : "Susana Stump",
"docLastCrawlDate" : "2011-07-25T13:52:04.000Z",
"docSource.feature.attributes.OWNERFIRST" : "Susana",
"docSource.feature.attributes.ZIPCODE" : "28216",
"docSource.feature.attributes.SHAPE_Length" :
234.4772918513361,
"docSource.feature.attributes.LANDVALUE" : 30000.0,
"docSource.feature.attributes.STNAMEFULL" : "10003 Lomax Ridge
DR",
"docSource.feature.attributes.OWNERLASTN" : "Stump",
"docSource.feature.attributes.HOUSENO" : "10003",
"docSource.feature.attributes.STTYPE" : "DR"
},
"highlight" : {
"docSource.feature.attributes.OWNERFULLN" : [ "usana
Stump " ]
}
}, ...............


(jmurdoch) #2

I should mention that this appears to be the case for both versions
0.17.1 and 0.17.2.

Thanks!

On Jul 28, 4:20 pm, DeepBlueSomewhere jv.murd...@gmail.com wrote:

It appears that the results returned from a search query are
truncating the contents of highlighted field. Am I doing something
incorrectly here? Notice how the 'S' of Susana is truncated off in
the highlight of the OWNERFULLN field.

Thanks in advance!

Query:

{
"query" : {
"query_string" : {
"query" : "Stump"
}
},
"fields" : ["*"],
"highlight" : {
"fields" : {
"docSource.feature.attributes.OWNERFULLN" : {
"fragment_size" : 100
}
}
}

}

Result:

{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 1.4052091,
"hits" : [ {
"_index" : "blah",
"type" : "blahblah",
"id" : "-1668604027
-2041742376__261713",
"_score" : 1.4052091,
"fields" : {
"docSource.feature.attributes.OWNERFULLN" : "Susana Stump",
"docLastCrawlDate" : "2011-07-25T13:52:04.000Z",
"docSource.feature.attributes.OWNERFIRST" : "Susana",
"docSource.feature.attributes.ZIPCODE" : "28216",
"docSource.feature.attributes.SHAPE_Length" :
234.4772918513361,
"docSource.feature.attributes.LANDVALUE" : 30000.0,
"docSource.feature.attributes.STNAMEFULL" : "10003 Lomax Ridge
DR",
"docSource.feature.attributes.OWNERLASTN" : "Stump",
"docSource.feature.attributes.HOUSENO" : "10003",
"docSource.feature.attributes.STTYPE" : "DR"
},
"highlight" : {
"docSource.feature.attributes.OWNERFULLN" : [ "usana
Stump " ]
}
}, ...............


(Nicolas Lalevée) #3

The Lucene highlighter is not very smart at how to choose the fragment to highlight.
My solution was to not fragment and do the split myself on the client side.
Further reading: http://www.elasticsearch.org/guide/reference/api/search/highlighting.html
see number_of_fragments, I've set it to 0.

Nicolas

Le 29 juil. 2011 à 01:22, DeepBlueSomewhere a écrit :

I should mention that this appears to be the case for both versions
0.17.1 and 0.17.2.

Thanks!

On Jul 28, 4:20 pm, DeepBlueSomewhere jv.murd...@gmail.com wrote:

It appears that the results returned from a search query are
truncating the contents of highlighted field. Am I doing something
incorrectly here? Notice how the 'S' of Susana is truncated off in
the highlight of the OWNERFULLN field.

Thanks in advance!

Query:

{
"query" : {
"query_string" : {
"query" : "Stump"
}
},
"fields" : ["*"],
"highlight" : {
"fields" : {
"docSource.feature.attributes.OWNERFULLN" : {
"fragment_size" : 100
}
}
}

}

Result:

{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 1.4052091,
"hits" : [ {
"_index" : "blah",
"type" : "blahblah",
"id" : "-1668604027
-2041742376__261713",
"_score" : 1.4052091,
"fields" : {
"docSource.feature.attributes.OWNERFULLN" : "Susana Stump",
"docLastCrawlDate" : "2011-07-25T13:52:04.000Z",
"docSource.feature.attributes.OWNERFIRST" : "Susana",
"docSource.feature.attributes.ZIPCODE" : "28216",
"docSource.feature.attributes.SHAPE_Length" :
234.4772918513361,
"docSource.feature.attributes.LANDVALUE" : 30000.0,
"docSource.feature.attributes.STNAMEFULL" : "10003 Lomax Ridge
DR",
"docSource.feature.attributes.OWNERLASTN" : "Stump",
"docSource.feature.attributes.HOUSENO" : "10003",
"docSource.feature.attributes.STTYPE" : "DR"
},
"highlight" : {
"docSource.feature.attributes.OWNERFULLN" : [ "usana
Stump " ]
}
}, ...............


(jmurdoch) #4

Thanks for the tip, Nicolas!
That will work just fine.

Cheers,
Jen

On Jul 31, 3:12 am, Nicolas Lalevée nicolas.lale...@hibnet.org
wrote:

The Lucene highlighter is not very smart at how to choose the fragment to highlight.
My solution was to not fragment and do the split myself on the client side.
Further reading:http://www.elasticsearch.org/guide/reference/api/search/highlighting....
see number_of_fragments, I've set it to 0.

Nicolas

Le 29 juil. 2011 à 01:22, DeepBlueSomewhere a écrit :

I should mention that this appears to be the case for both versions
0.17.1 and 0.17.2.

Thanks!

On Jul 28, 4:20 pm, DeepBlueSomewhere jv.murd...@gmail.com wrote:

It appears that the results returned from a search query are
truncating the contents of highlighted field. Am I doing something
incorrectly here? Notice how the 'S' of Susana is truncated off in
the highlight of the OWNERFULLN field.

Thanks in advance!

Query:

{
"query" : {
"query_string" : {
"query" : "Stump"
}
},
"fields" : ["*"],
"highlight" : {
"fields" : {
"docSource.feature.attributes.OWNERFULLN" : {
"fragment_size" : 100
}
}
}

}

Result:

{
"took" : 16,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 1.4052091,
"hits" : [ {
"_index" : "blah",
"type" : "blahblah",
"id" : "-1668604027
-2041742376__261713",
"_score" : 1.4052091,
"fields" : {
"docSource.feature.attributes.OWNERFULLN" : "Susana Stump",
"docLastCrawlDate" : "2011-07-25T13:52:04.000Z",
"docSource.feature.attributes.OWNERFIRST" : "Susana",
"docSource.feature.attributes.ZIPCODE" : "28216",
"docSource.feature.attributes.SHAPE_Length" :
234.4772918513361,
"docSource.feature.attributes.LANDVALUE" : 30000.0,
"docSource.feature.attributes.STNAMEFULL" : "10003 Lomax Ridge
DR",
"docSource.feature.attributes.OWNERLASTN" : "Stump",
"docSource.feature.attributes.HOUSENO" : "10003",
"docSource.feature.attributes.STTYPE" : "DR"
},
"highlight" : {
"docSource.feature.attributes.OWNERFULLN" : [ "usana
Stump " ]
}
}, ...............


(system) #5