Sorting by phrase instead of sorting by term


(Vladimir Khazin) #1

Stripped down sample data:

"hits": [
{
"_index": "gladiator",
"_type": "asset",
"_id": "014ac4d4-09ec-4319-97d0-29ed004bf20c",
"_score": 1,
"_source": {
"Id": "014ac4d4-09ec-4319-97d0-29ed004bf20c",
"Title": "Battle of the Year: Pressure"
}
},
{
"_index": "gladiator",
"_type": "asset",
"_id": "0d27e5db-31d6-422d-a91e-2d7b4cddb2cd",
"_score": 1,
"_source": {
"Id": "0d27e5db-31d6-422d-a91e-2d7b4cddb2cd",
"Title": "The Grandmaster: Train Fight (MBR DRM test)"
}
},
{
"_index": "gladiator",
"_type": "asset",
"_id": "2a70e840-3608-40df-a47e-2e893ce17d81",
"_score": 1,
"_source": {
"Id": "2a70e840-3608-40df-a47e-2e893ce17d81",
"Title": "CBGB: The Ramones"
}
},
{
"_index": "gladiator",
"_type": "asset",
"_id": "78c21b5a-16de-419d-b655-2ca400dddf07",
"_score": 1,
"_source": {
"Id": "78c21b5a-16de-419d-b655-2ca400dddf07",
"Title": "Eat"
}
},
{
"_index": "gladiator",
"_type": "asset",
"_id": "fbd61536-cd8c-4b03-8331-32738bdd5c1b",
"_score": 1,
"_source": {
"Id": "fbd61536-cd8c-4b03-8331-32738bdd5c1b",
"Title": "Facade"
}
},
{
"_index": "gladiator",
"_type": "asset",
"_id": "45e2eebc-4a23-48f4-8d79-2f831d4d4e2d",
"_score": 1,
"_source": {
"Id": "45e2eebc-4a23-48f4-8d79-2f831d4d4e2d",
"Title": "Enough Said"
}
},
{
"_index": "gladiator",
"_type": "asset",
"_id": "7a250a52-34d0-43da-adb7-2c8a40550de5",
"_score": 1,
"_source": {
"Id": "7a250a52-34d0-43da-adb7-2c8a40550de5",
"Title": "Her"
}
},
{
"_index": "gladiator",
"_type": "asset",
"_id": "148cfb62-0fb2-4e7f-bb39-27bb3b06e980",
"_score": 1,
"_source": {
"Id": "148cfb62-0fb2-4e7f-bb39-27bb3b06e980",
"Title": "12 Years A Slave a March07bc -- See where these
changes go"
}
},
{
"_index": "gladiator",
"_type": "asset",
"_id": "70fa68cb-4e8f-4c4a-8214-2e2b06f14e0f",
"_score": 1,
"_source": {
"Id": "70fa68cb-4e8f-4c4a-8214-2e2b06f14e0f",
"Title": "Jimmy Jamal, Super Beetle / I'm With Stupid"
}
},
{
"_index": "gladiator",
"_type": "asset",
"_id": "da3bd5c7-763f-4181-9f1e-2a7618a6cd2b",
"_score": 1,
"_source": {
"Id": "da3bd5c7-763f-4181-9f1e-2a7618a6cd2b",
"Title": "Weird Science Trailer"
}
}
]

Sample query:
{
"explain": true,
"sort": [
{
"Title": {
"order": "asc"
}
}
],
"_source": [
"Id",
"Title"
]
}

Output:
"hits": [
{
"_shard": 0,
"_node": "gDxBvtOYSI2VSVxgh14ZpQ",
"_index": "gladiator",
"_type": "asset",
"_id": "ca23459f-cc96-46cb-8ae8-509368467670",
"_score": null,
"_source": {
"Id": "ca23459f-cc96-46cb-8ae8-509368467670",
"Title": "TPTest Scaling 10:3"
},
"sort": [
"10"
],
"_explanation": {
"value": 1,
"description": "ConstantScore(cache(_type:asset)), product
of:",
"details": [
{
"value": 1,
"description": "boost"
},
{
"value": 1,
"description": "queryNorm"
}
]
}
},
{
"_shard": 0,
"_node": "gDxBvtOYSI2VSVxgh14ZpQ",
"_index": "gladiator",
"_type": "asset",
"_id": "148cfb62-0fb2-4e7f-bb39-27bb3b06e980",
"_score": null,
"_source": {
"Id": "148cfb62-0fb2-4e7f-bb39-27bb3b06e980",
"Title": "12 Years A Slave a March07bc -- See where these
changes go"
},
"sort": [
"12"
],
"_explanation": {
"value": 1,
"description": "ConstantScore(cache(_type:asset)), product
of:",
"details": [
{
"value": 1,
"description": "boost"
},
{
"value": 1,
"description": "queryNorm"
}
]
}
},
...
]

Problem:
The sort order is based on a term not on phrase - not quite natural output
from human sorting on title perspective, where '12 Years A Slave' would
have been expected as first item on the list.

Potential Solutions:

  1. Map the field as not analysed in _mapping. Challenge is that field
    needs to be searchable using terms
  2. Map the field as multi-field one as analysed and one as not analysed.
    Challenge is that end user can sort by any column and therefore would
    required special mapping pretty much for every field. Please note the
    sample data is not representative of true size of document - it contains
    hundreds of fields and structure often changes, not quite desired to
    multi-field every field on on-going basis.
  3. Query syntax has a double quote notation for search by phrase instead
    of search by term. Any special notation for sorting syntax to sort by
    phrase instead of by term?

Any other options/suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb415851-c0ca-47de-9132-948556fab5f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #2

Unfortunately, you cannot decide this at query-time. Sorting is most
efficient on single-unanalyzed field values. I'd probably limit the fields
that you can sort on and then multi-field them all. I don't know if this
would make it easier, but you can define dynamic templates for field
mappings wherein you can specify a single rule to map many fields in a
certain way:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_dynamic_templates

Also remember that sorting requires a memory structure called fielddata
which can become a problem if you don't control what fields your users can
sort on.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3aa51279-b031-42e7-802a-78a489917912%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Vladimir Khazin) #3

Hello Binh,

Thank you for your comments!
I was afraid that's the answer I am about get.

On Thursday, April 3, 2014 5:34:18 PM UTC-4, Binh Ly wrote:

Unfortunately, you cannot decide this at query-time. Sorting is most
efficient on single-unanalyzed field values. I'd probably limit the fields
that you can sort on and then multi-field them all. I don't know if this
would make it easier, but you can define dynamic templates for field
mappings wherein you can specify a single rule to map many fields in a
certain way:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_dynamic_templates

Also remember that sorting requires a memory structure called fielddata
which can become a problem if you don't control what fields your users can
sort on.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6392ad72-9a5e-48b4-81b3-ffc8203d434f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4