Counting a whole strings not individual words in the string


(Christopher Curzon) #1

I'm looking at a data file something like this

shipid status
ship01, in harbor
ship02, in transit
ship03, moored
ship04, in transit
ship05, in transit

Now using an aggregate query like this

"aggs": {
    "ship_agg": {
        "terms": {
            "field": "status",
        }
    }
}

gives me buckets, where the individual words are counted.

bucket("in") is 4
bucket("harbor") is 1
bucket("transit") is 3
bucket("moored") is 1

But what I need are counts of the whole status field, where the values are treated in whole,

bucket ("in harbor") is 3
bucket ("in transit") is 1
bucket ("moored") is 1

Can you suggest how my "aggs" clause can be changed to do this?

Thanks.

-- Chris Curzon


(Mike Simos) #2

You need to change or add another field with the index as not_analyzed:

{
    "status": {
        "type":     "string",
        "index":    "not_analyzed"
    }
}

https://www.elastic.co/guide/en/elasticsearch/reference/2.1/string.html

You can also use multi-fields to have status & status.raw:

https://www.elastic.co/guide/en/elasticsearch/reference/2.1/multi-fields.html


(Christopher Curzon) #3

Thanks for the reply.

We did try the "not_analyzed" feature, but the results still seemed to drill down to the word.

Maybe I'm not handling the bigger picture correctly. Here's what I'm doing.

------------- config file -------------

input { file ... etc }

filter {
csv {
columns => [
"shipid",
"status",
"rec_date"
]
separator => ","
}
}

output { elasticsearch ... etc ...}


I'm not really fluent with the syntax yet. In the Logstash config, where would your suggestion go? Should I add the type and index into the csv column list. So would I change the filter{} clause, like this

filter {
csv {
columns => [
"shipid",
"status" : { "type": "string", "index": "not_analyzed" }
"rec_date" ]
separator => ","
}
}

but that doesn't pass --configtest in Logstash, so I'm not sure how to proceed.

Thanks.


(Mike Simos) #4

Hi,

You can have 2 fields, one analyzed and one not_analyzed. If you do a query you can use the analyzed field. And if you do a aggregation, you can do it on the not_analyzed field.

You need to update your index mapping in Elasticsearch.

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html


(system) #5