Counting a whole strings not individual words in the string

(Christopher Curzon) #1

I'm looking at a data file something like this

shipid status
ship01, in harbor
ship02, in transit
ship03, moored
ship04, in transit
ship05, in transit

Now using an aggregate query like this

"aggs": {
    "ship_agg": {
        "terms": {
            "field": "status",

gives me buckets, where the individual words are counted.

bucket("in") is 4
bucket("harbor") is 1
bucket("transit") is 3
bucket("moored") is 1

But what I need are counts of the whole status field, where the values are treated in whole,

bucket ("in harbor") is 3
bucket ("in transit") is 1
bucket ("moored") is 1

Can you suggest how my "aggs" clause can be changed to do this?


-- Chris Curzon

(Mike Simos) #2

You need to change or add another field with the index as not_analyzed:

    "status": {
        "type":     "string",
        "index":    "not_analyzed"

You can also use multi-fields to have status & status.raw:

(Christopher Curzon) #3

Thanks for the reply.

We did try the "not_analyzed" feature, but the results still seemed to drill down to the word.

Maybe I'm not handling the bigger picture correctly. Here's what I'm doing.

------------- config file -------------

input { file ... etc }

filter {
csv {
columns => [
separator => ","

output { elasticsearch ... etc ...}

I'm not really fluent with the syntax yet. In the Logstash config, where would your suggestion go? Should I add the type and index into the csv column list. So would I change the filter{} clause, like this

filter {
csv {
columns => [
"status" : { "type": "string", "index": "not_analyzed" }
"rec_date" ]
separator => ","

but that doesn't pass --configtest in Logstash, so I'm not sure how to proceed.


(Mike Simos) #4


You can have 2 fields, one analyzed and one not_analyzed. If you do a query you can use the analyzed field. And if you do a aggregation, you can do it on the not_analyzed field.

You need to update your index mapping in Elasticsearch.

(system) #5