How to get Alphabetic Facet (A-Z) on name field in elasticsearch

Hi,

I have name field in my documents and I want to get facets on the base of
alphabets (first letter) instead of full name.

For example:

If I have docs with following name

  1. Maaz
  2. John
  3. Sarah
  4. Symonds

The facets response should be something like
{
"facets": {
"name": {
"_type": "terms",
"missing": 0,
"total": 4,
"other": 0,
"terms": [
{
"term": "S",
"count": 2
},
{
"term": "M",
"count": 1
},
{
"term": "J",
"count": 1
}
}

Is it possible to doing using elastic search term facets or I have to do
it pragmatically?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey there,

you can use a term facet with scripting, sample:

curl -X PUT localhost:9200/test/test/1 -d '{ "name":"Alex" }'
curl -X PUT localhost:9200/test/test/2 -d '{ "name":"Berta" }'
curl -X PUT localhost:9200/test/test/3 -d '{ "name":"Caesar" }'
curl -X PUT localhost:9200/test/test/4 -d '{ "name":"Andre" }'

curl -X POST 'localhost:9200/test/test/_search?pretty' -d '{"query": {
"match_all": {}} , "facets" : { "myName" : { "terms" : { "field":"name",
"script" : "term[0]" } } } }'

Result of the facet:

"facets" : {
"myName" : {
"_type" : "terms",
"missing" : 0,
"total" : 4,
"other" : 0,
"terms" : [ {
"term" : "a",
"count" : 2
}, {
"term" : "c",
"count" : 1
}, {
"term" : "b",
"count" : 1
} ]
}
}

Hope it helps... Note, that scripting is of course not that fast, as if you
had stored a field, which only contains the first letter...
For more information, check out the 'Term scripts' part at

--Alex

On Thu, Mar 28, 2013 at 9:27 AM, Maaz Bin Tariq maaz786@gmail.com wrote:

Hi,

I have name field in my documents and I want to get facets on the base of
alphabets (first letter) instead of full name.

For example:

If I have docs with following name

  1. Maaz
  2. John
  3. Sarah
  4. Symonds

The facets response should be something like
{
"facets": {
"name": {
"_type": "terms",
"missing": 0,
"total": 4,
"other": 0,
"terms": [
{
"term": "S",
"count": 2
},
{
"term": "M",
"count": 1
},
{
"term": "J",
"count": 1
}
}

Is it possible to doing using Elasticsearch term facets or I have to do
it pragmatically?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

just a small, but important note on my sample. I ran this on an analyzed
field, which you should not do. Make sure you run this on a not_analyzed
field, otherwise your facet will return values for each term.

--Alex

On Thu, Mar 28, 2013 at 9:49 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey there,

you can use a term facet with scripting, sample:

curl -X PUT localhost:9200/test/test/1 -d '{ "name":"Alex" }'
curl -X PUT localhost:9200/test/test/2 -d '{ "name":"Berta" }'
curl -X PUT localhost:9200/test/test/3 -d '{ "name":"Caesar" }'
curl -X PUT localhost:9200/test/test/4 -d '{ "name":"Andre" }'

curl -X POST 'localhost:9200/test/test/_search?pretty' -d '{"query": {
"match_all": {}} , "facets" : { "myName" : { "terms" : { "field":"name",
"script" : "term[0]" } } } }'

Result of the facet:

"facets" : {
"myName" : {
"_type" : "terms",
"missing" : 0,
"total" : 4,
"other" : 0,
"terms" : [ {
"term" : "a",
"count" : 2
}, {
"term" : "c",
"count" : 1
}, {
"term" : "b",
"count" : 1
} ]
}
}

Hope it helps... Note, that scripting is of course not that fast, as if
you had stored a field, which only contains the first letter...
For more information, check out the 'Term scripts' part at
Elasticsearch Platform — Find real-time answers at scale | Elastic

--Alex

On Thu, Mar 28, 2013 at 9:27 AM, Maaz Bin Tariq maaz786@gmail.com wrote:

Hi,

I have name field in my documents and I want to get facets on the base of
alphabets (first letter) instead of full name.

For example:

If I have docs with following name

  1. Maaz
  2. John
  3. Sarah
  4. Symonds

The facets response should be something like
{
"facets": {
"name": {
"_type": "terms",
"missing": 0,
"total": 4,
"other": 0,
"terms": [
{
"term": "S",
"count": 2
},
{
"term": "M",
"count": 1
},
{
"term": "J",
"count": 1
}
}

Is it possible to doing using Elasticsearch term facets or I have to do
it pragmatically?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, 2013-03-28 at 09:49 +0100, Alexander Reelsen wrote:

Hey there,

curl -X POST 'localhost:9200/test/test/_search?pretty' -d '{"query":
{ "match_all": {}} , "facets" : { "myName" : { "terms" :
{ "field":"name", "script" : "term[0]" } } } }'

While this might work, it's always more efficient to prepare your data
according to your needs, rather than trying to bolt things on
afterwards. It'll perform better and use less memory.

So if you want to sort on the first letter of a field, then index the
first letter into a different field.

You could even use multi-fields for this, eg the "name" field is
analyzed per usual, and the "name.first_letter" just indexes the first
letter of the name.

Create an index using the pattern tokenizer to capture just

the first letter of each name string

the main "name" field uses the default analyzer, and the

"name.first_letter" field uses our custom analyzer

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"first_letter" : {
"filter" : [
"lowercase"
],
"tokenizer" : "first_letter"
}
},
"tokenizer" : {
"first_letter" : {
"pattern" : "^(.).*$",
"group" : 1,
"type" : "pattern"
}
}
}
},
"mappings" : {
"test" : {
"properties" : {
"name" : {
"fields" : {
"name" : {
"type" : "string"
},
"first_letter" : {
"type" : "string",
"analyzer" : "first_letter"
}
},
"type" : "multi_field"
}
}
}
}
}
'

Index some data

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{ "name" : "Alex" }
'
curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{ "name" : "Berta" }
'
curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{ "name" : "Caesar" }
'
curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{ "name" : "Andre" }
'

Facet on the "name.first_letter" fields

curl -XGET
'http://127.0.0.1:9200/test/test/_search?pretty=1&search_type=count' -d
'
{
"facets" : {
"first_letter" : {
"terms" : {
"field" : "name.first_letter"
}
}
}
}
'

{

"hits" : {

"hits" : ,

"max_score" : 0,

"total" : 4

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"facets" : {

"first_letter" : {

"other" : 0,

"terms" : [

{

"count" : 2,

"term" : "a"

},

{

"count" : 1,

"term" : "c"

},

{

"count" : 1,

"term" : "b"

}

],

"missing" : 0,

"_type" : "terms",

"total" : 4

}

},

"took" : 1

}

And search on the main "name" field:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"match" : {
"name" : "alex"
}
}
}
'

{

"hits" : {

"hits" : [

{

"_source" : {

"name" : "Alex"

},

"_score" : 0.30685282,

"_index" : "test",

"_id" : "CyJn4ixDRwy17Yl-UxSJwQ",

"_type" : "test"

}

],

"max_score" : 0.30685282,

"total" : 1

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"took" : 4

}

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks alot for the help.

On Thu, Mar 28, 2013 at 3:27 PM, Clinton Gormley clint@traveljury.comwrote:

On Thu, 2013-03-28 at 09:49 +0100, Alexander Reelsen wrote:

Hey there,

curl -X POST 'localhost:9200/test/test/_search?pretty' -d '{"query":
{ "match_all": {}} , "facets" : { "myName" : { "terms" :
{ "field":"name", "script" : "term[0]" } } } }'

While this might work, it's always more efficient to prepare your data
according to your needs, rather than trying to bolt things on
afterwards. It'll perform better and use less memory.

So if you want to sort on the first letter of a field, then index the
first letter into a different field.

You could even use multi-fields for this, eg the "name" field is
analyzed per usual, and the "name.first_letter" just indexes the first
letter of the name.

Create an index using the pattern tokenizer to capture just

the first letter of each name string

the main "name" field uses the default analyzer, and the

"name.first_letter" field uses our custom analyzer

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"first_letter" : {
"filter" : [
"lowercase"
],
"tokenizer" : "first_letter"
}
},
"tokenizer" : {
"first_letter" : {
"pattern" : "^(.).*$",
"group" : 1,
"type" : "pattern"
}
}
}
},
"mappings" : {
"test" : {
"properties" : {
"name" : {
"fields" : {
"name" : {
"type" : "string"
},
"first_letter" : {
"type" : "string",
"analyzer" : "first_letter"
}
},
"type" : "multi_field"
}
}
}
}
}
'

Index some data

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{ "name" : "Alex" }
'
curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{ "name" : "Berta" }
'
curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{ "name" : "Caesar" }
'
curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{ "name" : "Andre" }
'

Facet on the "name.first_letter" fields

curl -XGET
'http://127.0.0.1:9200/test/test/_search?pretty=1&search_type=count' -d
'
{
"facets" : {
"first_letter" : {
"terms" : {
"field" : "name.first_letter"
}
}
}
}
'

{

"hits" : {

"hits" : ,

"max_score" : 0,

"total" : 4

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"facets" : {

"first_letter" : {

"other" : 0,

"terms" : [

{

"count" : 2,

"term" : "a"

},

{

"count" : 1,

"term" : "c"

},

{

"count" : 1,

"term" : "b"

}

],

"missing" : 0,

"_type" : "terms",

"total" : 4

}

},

"took" : 1

}

And search on the main "name" field:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"match" : {
"name" : "alex"
}
}
}
'

{

"hits" : {

"hits" : [

{

"_source" : {

"name" : "Alex"

},

"_score" : 0.30685282,

"_index" : "test",

"_id" : "CyJn4ixDRwy17Yl-UxSJwQ",

"_type" : "test"

}

],

"max_score" : 0.30685282,

"total" : 1

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"took" : 4

}

clint

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/a_KmS6rns2Y/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.