Sorting a string field numerically

Another index-time possibility is to use a magnitude notation prefix for all the numeric portions.

So - if the number is 1 digit long, prefix it with a '1', if it is two digits, prefix with a '2', and so on. If you think you'll exceed 10 digits anywhere, then use a two-digit prefix (i.e. 01 and 02 respectively for a 1 digit and a 2 digit number). That gets around having long tokens (e.g. a whole bunch of leading zeroes) which might be an issue, and also allows you some more flexibility on how long an integer you might encounter without having to re-index everything to put more leading zeroes on the front if you exceed your initial 'max length' number. (i.e. a 2 digit mag notation prefix will handle up to a 99 digit number)

So, 1, 2, 10, 20 would be indexed as 011, 012, 0210, 0220, which will sort 'alphabetically' (i.e. LTR sort) in the correct numeric sequence.

Bob Sandiford | Principal Engineer | SirsiDynix
P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.commailto:Bob.Sandiford@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/

Join the conversation: Like us on Facebook!http://www.facebook.com/SirsiDynix Follow us on Twitter!http://twitter.com/SirsiDynix

From: Igor Motov-3 [via Elasticsearch Users] [mailto:ml-node+s115913n4024574h41@n3.nabble.com]
Sent: Monday, October 29, 2012 9:58 AM
To: Bob Sandiford
Subject: Re: Sorting a string field numerically

I think the idea that Radu outlined might work. But I would propose a few changes to the implementation. First of all, retrieving and parsing source for every single result can be very expensive especially if records have large sources. So, it might be better to replace _source lookup with doc or _field lookup.

The expression "_source.foo is Integer" would work only if "foo" was an integer in the source. In other words, if it was indexed as {"foo": 123} it would work, but if it was indexed as {"foo": "123"} it wouldn't. In order to handle the latter case, we need to actually parse the string.

To answer, Nick's question, we can simplify handling of non-numeric cases, by switching from custom_score query to sort script .

So, if we combine all these suggestions, we will get something like this:

{
"query" : {
"match_all": {}
},
"sort": {
"_script":{
"script" : "s = doc['''foo'''].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s); if (n != null) { String.format("%010d",n)} else { s }",
"type" : "string"
}
}
}'

That still might be too slow for the large result list. So, an ideal solution here would be to pad integer values with 0 so all integers have the same size during indexing. In other words, do the same thing that the script above is doing but during indexing.

Here is a complete example if somebody wants to play with it: Sorting mixed integer/string field · GitHub

On Monday, October 29, 2012 8:59:26 AM UTC-4, Nick Hoffman wrote:
Hi Radu. That's an interesting idea. How would you score fields that aren't numeric, though?

On Monday, 29 October 2012 05:49:37 UTC-4, Radu Gheorghe wrote:
Hello,

You can use the _source field, if you got it stored, in a script
within a custom_score query. For example, assuming "foo" is the name
of our field:

curl -XPOST localhost:9200/test/test/_search?pretty=true -d '{
"query": {
"custom_score": {
"query": {
"match_all": {}
},
"script": "if (_source.foo is Integer) { _source.foo }
else { _score/5 }"
}
}
}'

This will get your numeric values sorted as numbers. You'd need to
fill out the "else" part of the script with whatever values you want
to give to your non-numeric fields.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Sun, Oct 28, 2012 at 9:10 PM, Nick Hoffman <[hidden email]</user/SendEmail.jtp?type=node&node=4024574&i=0>> wrote:

Hah, I asked the exact same question:
Redirecting to Google Groups

On Sunday, 28 October 2012 14:45:25 UTC-4, James Hu wrote:

I have a field that is string type. Sometimes it will contain integer
values such as 1, 2, 10, 20. Currently how it sorts those is 1, 10, 2, 20.
How do I go aboutsorting that numerically so that it's 1, 2, 10, 20?

--

--


If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/Sorting-a-string-field-numerically-tp4024557p4024574.html
To start a new topic under Elasticsearch Users, email ml-node+s115913n115913h50@n3.nabble.commailto:ml-node+s115913n115913h50@n3.nabble.com
To unsubscribe from Elasticsearch Users, click herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=115913&code=Ym9iLnNhbmRpZm9yZEBzaXJzaWR5bml4LmNvbXwxMTU5MTN8LTIxMTYxMTI0NTQ=.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml