Aggregations over fields which have multiple words in them


(Ankit Jain-2) #1

Hey Folks, how do I do aggregations on a multi word field? i.e. if i have a
field named "device model" which could have "samsung galaxy s5" or "iphone
5s" when I do an aggregation, it aggregates "samsung" separately from
"galaxy" separately from "s5"... ideas?

I tried defining the index with a mapping where that field has index :
not_analyzed set...

Thanks in advance,
Ankit!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/433ae04a-73b4-462d-9be0-d5f20ff6c42f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

Yes. Using not_analyzed is the way to go.
May be you could create a SENSE script and gist it so we can see what you did wrong?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 juil. 2014 à 09:47, Ankit Jain ankit@quettra.com a écrit :

Hey Folks, how do I do aggregations on a multi word field? i.e. if i have a field named "device model" which could have "samsung galaxy s5" or "iphone 5s" when I do an aggregation, it aggregates "samsung" separately from "galaxy" separately from "s5"... ideas?

I tried defining the index with a mapping where that field has index : not_analyzed set...

Thanks in advance,
Ankit!

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/433ae04a-73b4-462d-9be0-d5f20ff6c42f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91A4A108-9FAD-46BD-A485-C3BC5903FDC9%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(Ankit Jain-2) #3

Not sure what a SENSE script is but below is how to reproduce this. Let me
know if this makes sense.

Thank you for helping.

-Ankit

ankit$ curl -XPUT http://localhost:9200/test -d '"mappings": { "test": {
"properties": { "deviceId": { "type": "string", "index": "not_analyzed" },
"basics": { "type": "nested", "properties": { "sex": { "type": "string",
"index": "not_analyzed" }, "device": { "type": "string", "index":
"not_analyzed" } } } } } } }'

{"acknowledged":true}

ankit$ curl -XPUT 'http://localhost:9200/test/test/1' -d '{ "basics": {
"sex": "m", "device": "LGE LG-P768" }, "deviceId": "1" }'

{"_index":"test","_type":"test","_id":"1","_version":1,"created":true}

ankit$ curl -XPUT 'http://localhost:9200/test/test/2' -d '{ "basics": {
"sex": "m", "device": "Samsung SHW-M250S" }, "deviceId": "2" }'

{"_index":"test","_type":"test","_id":"2","_version":1,"created":true}

ankit$ curl -XGET
'http://localhost:9200/test/test/_search?search_type=count&pretty' -d '{
"aggregations": { "popular_devices": { "terms": { "field": "device" } } } }'

{

"took" : 2,

"timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"failed" : 0

},

"hits" : {

"total" : 2,

"max_score" : 0.0,

"hits" : [ ]

},

"aggregations" : {

"popular_devices" : {

  "buckets" : [ {

    "key" : "lg",

    "doc_count" : 1

  }, {

    "key" : "lge",

    "doc_count" : 1

  }, {

    "key" : "m250s",

    "doc_count" : 1

  }, {

    "key" : "p768",

    "doc_count" : 1

  }, {

    "key" : "samsung",

    "doc_count" : 1

  }, {

    "key" : "shw",

    "doc_count" : 1

  } ]

}

}

}

On Thursday, July 3, 2014 1:08:13 AM UTC-7, David Pilato wrote:

Yes. Using not_analyzed is the way to go.
May be you could create a SENSE script and gist it so we can see what you
did wrong?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 juil. 2014 à 09:47, Ankit Jain <an...@quettra.com <javascript:>> a
écrit :

Hey Folks, how do I do aggregations on a multi word field? i.e. if i have
a field named "device model" which could have "samsung galaxy s5" or
"iphone 5s" when I do an aggregation, it aggregates "samsung" separately
from "galaxy" separately from "s5"... ideas?

I tried defining the index with a mapping where that field has index :
not_analyzed set...

Thanks in advance,
Ankit!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/433ae04a-73b4-462d-9be0-d5f20ff6c42f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/433ae04a-73b4-462d-9be0-d5f20ff6c42f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bbf49792-7f19-4359-b53f-860d8ddb16b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #4

Strange.

Your script does not work on my end.
I think it's may be because you already have a test index before starting your test???

Actually, I did not get any bucket as an answer.

And that's normal to me as you defined a nested doc.

This script works fine for me: https://gist.github.com/dadoonet/a04013991566b8a5ec39

It gives:

{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"popular_devices": {
"buckets": [
{
"key": "LGE LG-P768",
"doc_count": 1
},
{
"key": "Samsung SHW-M250S",
"doc_count": 1
}
]
}
}
}

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 3 juillet 2014 à 18:47:51, Ankit Jain (ankit@quettra.com) a écrit:

Not sure what a SENSE script is but below is how to reproduce this. Let me know if this makes sense.

Thank you for helping.

-Ankit

ankit$ curl -XPUT http://localhost:9200/test -d '"mappings": { "test": { "properties": { "deviceId": { "type": "string", "index": "not_analyzed" }, "basics": { "type": "nested", "properties": { "sex": { "type": "string", "index": "not_analyzed" }, "device": { "type": "string", "index": "not_analyzed" } } } } } } }'

{"acknowledged":true}

ankit$ curl -XPUT 'http://localhost:9200/test/test/1' -d '{ "basics": { "sex": "m", "device": "LGE LG-P768" }, "deviceId": "1" }'

{"_index":"test","_type":"test","_id":"1","_version":1,"created":true}

ankit$ curl -XPUT 'http://localhost:9200/test/test/2' -d '{ "basics": { "sex": "m", "device": "Samsung SHW-M250S" }, "deviceId": "2" }'

{"_index":"test","_type":"test","_id":"2","_version":1,"created":true}

ankit$ curl -XGET 'http://localhost:9200/test/test/_search?search_type=count&pretty' -d '{ "aggregations": { "popular_devices": { "terms": { "field": "device" } } } }'

{

"took" : 2,

"timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"failed" : 0

},

"hits" : {

"total" : 2,

"max_score" : 0.0,

"hits" : [ ]

},

"aggregations" : {

"popular_devices" : {

  "buckets" : [ {

    "key" : "lg",

    "doc_count" : 1

  }, {

    "key" : "lge",

    "doc_count" : 1

  }, {

    "key" : "m250s",

    "doc_count" : 1

  }, {

    "key" : "p768",

    "doc_count" : 1

  }, {

    "key" : "samsung",

    "doc_count" : 1

  }, {

    "key" : "shw",

    "doc_count" : 1

  } ]

}

}

}

On Thursday, July 3, 2014 1:08:13 AM UTC-7, David Pilato wrote:
Yes. Using not_analyzed is the way to go.
May be you could create a SENSE script and gist it so we can see what you did wrong?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 juil. 2014 à 09:47, Ankit Jain an...@quettra.com a écrit :

Hey Folks, how do I do aggregations on a multi word field? i.e. if i have a field named "device model" which could have "samsung galaxy s5" or "iphone 5s" when I do an aggregation, it aggregates "samsung" separately from "galaxy" separately from "s5"... ideas?

I tried defining the index with a mapping where that field has index : not_analyzed set...

Thanks in advance,
Ankit!

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/433ae04a-73b4-462d-9be0-d5f20ff6c42f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bbf49792-7f19-4359-b53f-860d8ddb16b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53b5984c.2d1d5ae9.cc4%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(system) #5