Strange behvaiour when sorting a field that contains a special char


(Roman Kournjaev) #1

Hi

I am experiencing somewhat strange and unpredicted behaviour sorting
documents upon name.

For instance , my documents have the following names :

  • Craftsman
  • Arroyo Craftsman
  • Guild of Master Craftsman Publications Ltd
  • Craftsman House
  • craftsman.com
  • Craftsman Evolv

The query is :

   QueryBuilder queryBuilder = textPhrasePrefixQuery("name", query).

boost(1.0f);
SearchResponse response = client.prepareSearch("tags").setTypes(
"tag")
.setQuery(queryBuilder)
.setFrom(0).setSize(100).setExplain(true)//.addSort("name.sort",
SortOrder.DESC)
.execute()
.actionGet();

The document with the name : *craftsman.com *is the first when sorting DESC
and last when sorting ASC , which is clearly something that i did not
expect.
Did someone expirience something similar like that ?

I also wanted to debug ES to see what the problem is , but struggle to find
the entry poin tto put my breakpoint to. Do you know a good point where to
start ?

Thanks
Roman

--


(Roman Kournjaev) #2

Nevermind we found out that sorting is case sensitive by default , so that
the issue

On Saturday, August 18, 2012 10:18:28 PM UTC+3, Roman Kournjaev wrote:

Hi

I am experiencing somewhat strange and unpredicted behaviour sorting
documents upon name.

For instance , my documents have the following names :

  • Craftsman
  • Arroyo Craftsman
  • Guild of Master Craftsman Publications Ltd
  • Craftsman House
  • craftsman.com
  • Craftsman Evolv

The query is :

   QueryBuilder queryBuilder = textPhrasePrefixQuery("name", query).

boost(1.0f);
SearchResponse response = client.prepareSearch("tags").setTypes(
"tag")
.setQuery(queryBuilder)
.setFrom(0).setSize(100).setExplain(true)//.addSort("name.sort",
SortOrder.DESC)
.execute()
.actionGet();

The document with the name : *craftsman.com *is the first when sorting
DESC and last when sorting ASC , which is clearly something that i did not
expect.
Did someone expirience something similar like that ?

I also wanted to debug ES to see what the problem is , but struggle to
find the entry poin tto put my breakpoint to. Do you know a good point
where to start ?

Thanks
Roman

--


(David Pilato) #3

It could be relative to your mapping. If you use default mapping, your name is break into 2 tokens: craftsman and com.

HTH

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 août 2012 à 21:18, Roman Kournjaev kournjaev@gmail.com a écrit :

Hi

I am experiencing somewhat strange and unpredicted behaviour sorting documents upon name.

For instance , my documents have the following names :
Craftsman
Arroyo Craftsman
Guild of Master Craftsman Publications Ltd
Craftsman House


Craftsman Evolv
The query is :

   QueryBuilder queryBuilder = textPhrasePrefixQuery("name", query).boost(1.0f);
    SearchResponse response = client.prepareSearch("tags").setTypes("tag")
            .setQuery(queryBuilder)
            .setFrom(0).setSize(100).setExplain(true)//.addSort("name.sort", SortOrder.DESC)
            .execute()
            .actionGet();

The document with the name : craftsman.com is the first when sorting DESC and last when sorting ASC , which is clearly something that i did not expect.
Did someone expirience something similar like that ?

I also wanted to debug ES to see what the problem is , but struggle to find the entry poin tto put my breakpoint to. Do you know a good point where to start ?

Thanks
Roman

--

--


(David Pilato) #4

Sorry. I answered too late.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 août 2012 à 22:46, David Pilato david@pilato.fr a écrit :

It could be relative to your mapping. If you use default mapping, your name is break into 2 tokens: craftsman and com.

HTH

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 août 2012 à 21:18, Roman Kournjaev kournjaev@gmail.com a écrit :

Hi

I am experiencing somewhat strange and unpredicted behaviour sorting documents upon name.

For instance , my documents have the following names :
Craftsman
Arroyo Craftsman
Guild of Master Craftsman Publications Ltd
Craftsman House


Craftsman Evolv
The query is :

   QueryBuilder queryBuilder = textPhrasePrefixQuery("name", query).boost(1.0f);
    SearchResponse response = client.prepareSearch("tags").setTypes("tag")
            .setQuery(queryBuilder)
            .setFrom(0).setSize(100).setExplain(true)//.addSort("name.sort", SortOrder.DESC)
            .execute()
            .actionGet();

The document with the name : craftsman.com is the first when sorting DESC and last when sorting ASC , which is clearly something that i did not expect.
Did someone expirience something similar like that ?

I also wanted to debug ES to see what the problem is , but struggle to find the entry poin tto put my breakpoint to. Do you know a good point where to start ?

Thanks
Roman

--

--

--


(rickcrawford) #5

Sorting follows unicode byte ordering for strings so you want to make sure
to use numerics for sorting when possible, or use a keyword filter +
lowercase to normalize your strings as much as possible.

On Saturday, August 18, 2012 12:18:28 PM UTC-7, Roman Kournjaev wrote:

Hi

I am experiencing somewhat strange and unpredicted behaviour sorting
documents upon name.

For instance , my documents have the following names :

  • Craftsman
  • Arroyo Craftsman
  • Guild of Master Craftsman Publications Ltd
  • Craftsman House
  • craftsman.com
  • Craftsman Evolv

The query is :

   QueryBuilder queryBuilder = textPhrasePrefixQuery("name", query).

boost(1.0f);
SearchResponse response = client.prepareSearch("tags").setTypes(
"tag")
.setQuery(queryBuilder)
.setFrom(0).setSize(100).setExplain(true)//.addSort("name.sort",
SortOrder.DESC)
.execute()
.actionGet();

The document with the name : *craftsman.com *is the first when sorting
DESC and last when sorting ASC , which is clearly something that i did not
expect.
Did someone expirience something similar like that ?

I also wanted to debug ES to see what the problem is , but struggle to
find the entry poin tto put my breakpoint to. Do you know a good point
where to start ?

Thanks
Roman

--


(system) #6