Can't sort on string of many words


(cgascon) #1

Hi,

I'm trying to sort documents based on a field named "sortableSubject" (not analyzed, type string).

But even though it's not analyzed, it throws the following exception:

"ReduceSearchPhaseException: Failed to execute phase [fetch], [reduce] ; shardFailures {[1][1][4]: QueryPhaseExecutionException[[1][4]: query[+(body:body* | subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject": org.elasticsearch.index.field.data.strings.StringFieldDataType$1@14637eb>!]: Query Failed [Failed to execute main query]]; nested: IOException[Can't sort on string types with more than one value per doc, or more than one token per field]; }{[1][1][0]: QueryPhaseExecutionException[[1][0]: query[+(body:body* | subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject": org.elasticsearch.index.field.data.strings.StringFieldDataType$1@1b72b99>!]: Query Failed [Failed to execute main query]]; nested: IOException[Can't sort on string types with more than one value per doc, or more than one token per field]; }{[1][1][3]: QueryPhaseExecutionException[[1][3]: query[+(body:body* | subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject": org.elasticsearch.index.field.data.strings.StringFieldDataType$1@547ea3>!]: Query Failed [Failed to execute main query]]; nested: IOException[Can't sort on string types with more than one value per doc, or more than one token per field]; }{[1][1][1]: QueryPhaseExecutionException[[1][1]: query[+(body:body* | subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject": org.elasticsearch.index.field.data.strings.StringFieldDataType$1@8f7cf4>!]: Query Failed [Failed to execute main query]]; nested: IOException[Can't sort on string types with more than one value per doc, or more than one token per field]; }{[1][1][2]: QueryPhaseExecutionException[[1][2]: query[+(body:body* | subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject": org.elasticsearch.index.field.data.strings.StringFieldDataType$1@1015d59>!]: Query Failed [Failed to execute main query]]; nested: IOException[Can't sort on string types with more than one value per doc, or more than one token per field]; }

Am I missing something ? Can't we sort documents on a string of many words ?


(Shay Banon) #2

Can you gist a recreation (http://www.elasticsearch.org/help)? If its not
analyzed, this should not happen.

On Mon, Jul 18, 2011 at 11:52 PM, cgascon gascon.charles@gmail.com wrote:

Hi,

I'm trying to sort documents based on a field named "sortableSubject" (not
analyzed, type string).

But even though it's not analyzed, it throws the following exception:

"ReduceSearchPhaseException: Failed to execute phase [fetch], [reduce] ;
shardFailures {[1][1][4]: QueryPhaseExecutionException[[1][4]:
query[+(body:body* |

subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject":
org.elasticsearch.index.field.data.strings.StringFieldDataType$1@14637eb
>!]:
Query Failed [Failed to execute main query]]; nested: IOException[Can't
sort
on string types with more than one value per doc, or more than one token
per
field]; }{[1][1][0]: QueryPhaseExecutionException[[1][0]:
query[+(body:body*
|

subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject":
org.elasticsearch.index.field.data.strings.StringFieldDataType$1@1b72b99
>!]:
Query Failed [Failed to execute main query]]; nested: IOException[Can't
sort
on string types with more than one value per doc, or more than one token
per
field]; }{[1][1][3]: QueryPhaseExecutionException[[1][3]:
query[+(body:body*
|

subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject":
org.elasticsearch.index.field.data.strings.StringFieldDataType$1@547ea3
>!]:
Query Failed [Failed to execute main query]]; nested: IOException[Can't
sort
on string types with more than one value per doc, or more than one token
per
field]; }{[1][1][1]: QueryPhaseExecutionException[[1][1]:
query[+(body:body*
|

subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject":
org.elasticsearch.index.field.data.strings.StringFieldDataType$1@8f7cf4
>!]:
Query Failed [Failed to execute main query]]; nested: IOException[Can't
sort
on string types with more than one value per doc, or more than one token
per
field]; }{[1][1][2]: QueryPhaseExecutionException[[1][2]:
query[+(body:body*
|

subject:body*)],from[0],size[5000],sort[<custom:"sortableSubject":
org.elasticsearch.index.field.data.strings.StringFieldDataType$1@1015d59
>!]:
Query Failed [Failed to execute main query]]; nested: IOException[Can't
sort
on string types with more than one value per doc, or more than one token
per
field]; }

Am I missing something ? Can't we sort documents on a string of many words
?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-sort-on-string-of-many-words-tp3180547p3180547.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(cgascon) #3

This was made while unit testing (local node) and I just realized the mapping wasn't really put, so default mapping was used I guess (the field was then analyzed and cause problem when I was sorting on this field).

Thanks though !


(yin weifeng) #4

hi,cgascon,i has the some problem,can you tall me how do you fixed it?thank you!


(cgascon) #5
hi,cgascon,i has the some problem,can you tall me how do you fixed it?thank you!

Hi yin weifeng,

The problem was the mapping.

Here is the mapping I have now (I use two properties), please notice I've copied this code from java and removed some backslash etc so there may be some syntax errors.

[...]
"subject":{"type":"string"},
"sortableSubject":{"index":"not_analyzed","type":"string"}
[...]

The problem was that the mapping was not this one, but I thought it was. sortableSubject was then an analyzed string.

So I got rid on my old mapping made sure to create this mapping for the object these properties are in.

I Hope it helps.


(louis gueye) #6

Hi yin,

A string, when analyzed, gets split into tokens.

So imagine those strings

"can you tall me how do you fixed it"
"you can not afford that"

Results (very roughly) in ["can" : 1,"you" : 2,"tall" : 1,"me" : 1, "how" :
1,"do" : 1, "fixed" : 1, "it" : 1] after analyzing.
Results (very roughly) in ["can" : 1,"you" : 1,"afford" : 1,"that" : 1] after
analyzing.

When you search by "you" you will get 2 hits, but the original field
doesn't exist any longer, it's has been split into occurences by token. How
do you sort then ?
You can sort by score : the document that has the highest tokens count
comes first (call it ranking, pertinence, whatever).
That's the reason why one can not apply lexicographic sort on ANALYZED
fields, NON ANALYZED ones can.

There are some workarounds : store 2 versions of your field, one that is
analyzed, the other one that is not, and sort by the one that is not
analyzed.
This is exactly what *multifield *does.

Hope it helps.

--
Cordialement/Regards,

Louis GUEYE
linkedin http://fr.linkedin.com/in/louisgueye |
bloghttp://deepintojee.wordpress.com/
| twitter http://twitter.com/#!/lgueye

2012/2/2 yin weifeng ywf1990@gmail.com

hi,cgascon,i has the some problem,can you tall me how do you fixed it?thank
you!

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-sort-on-string-of-many-words-tp3180547p3709137.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #7