All fields are mapped as text

Hello,
I wasn't sure if this is more an Elasticsearch or Logstash issue, hope this is the right place. As you can probably guess I am rather new to Elasticsearch.
This is Elasticsearch 6.8 on CentOS 7.7.

I have a CSV that I'm parsing in logstash using the file input plugin, the csv filter, and elasticsearch output. It works out okay, I'm seeing the parsed values in Kibana. But the problem is, they all have the text data type when in reality, it's mostly numbers. This means I cannot create any meaningful visualizations. It was my understanding that through auto-mapping, Logstash (or Elasticsearch?) would recognize that a number is a number, but in my case all fields are text.

My data file looks like this:

datetime,timestamp,processname,pid,cpu_time,mem_vms,mem_rss
2020-04-01 00:00:00.473727,1585699200.47,housekeeper,24835,20,8830976,53592064

(and obviously many more lines)
You see, there is a date (which I can parse using Logstash's date filter), and one text field (processname) - the rest are numbers.

How can I tell Logstash (or Elasticsearch?) that these are numbers?
Bonus question: there are other CSV files with lots and lots of columns, whose name or order may change over time. How can I handle this as elegantly as possible? (I was really hoping the auto-mapping would work...)

Relevant parts of my logstash config (can post in full if necessary):

input {
    file {
        path => ["/tmp/*mpmon*"] 
        mode => "read"
        file_completed_action => "log"
        file_completed_log_path => "/dev/null"
        sincedb_path => "/dev/null"
        start_position => "beginning"
    }
}
filter {
    csv {
        autodetect_column_names => true
    }
}
output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "procmon-%{+YYYY.MM.dd}"
    }
}

I think either of 2 ways, either use logstash mutate convert

  mutate {
    convert => { "fieldname" => "integer" }
  }

Or in a template that matches index "procmon-*" that defines field mappings.

However, you can't change the existing index, it's good that it's a daily index. You will likely get conflicts in the Kibana index pattern after the changes.

Thanks!

Forgive my ignorance, I've come across the term index template before, but haven't really found resources to explain the concept or how to implement it. Do you maybe know where I can read more about this?

Also, does that mean that the auto-mapping I've read about is either something completely different, or will not recognize numbers?

You will likely get conflicts in the Kibana index pattern after the changes.

Good to know, in this case it's no worry as it's just a dev machine with test data.

Edit: Regarding documentation, I've found the API doc. But that does not really help me, I would like to understand how to apply index templates to my specific setup - a good tutorial with a practical example would be really helpful.

I think the dynamic mapping gets the implied type from the JSON, if it sees "pid": "1234", it's text, because of the quotes. If it sees "pid": 1234, it's numeric. I think that's basically what the convert numeric does, changes the JSON to be sent. (It's been awhile, but logstash stdout shows the differences and it's likely more complicated than that simple example)

There is a template tutorial, not from elasticsearch. Basically most things you can set in an index, you can put in a template that gets applied to future indices matching the name pattern.

You can also look at the templates that automatically get loaded, particularly filebeat.

That tutorial is exactly what I was looking for, thanks again - you've helped me a lot!

Regarding the problem at hand, your explanation with the JSON quotes absolutely makes sense. As mentioned I'd really like a 'dynamic', abstract solution and prefer not to hard-code a type for each column. So I implemented a short ruby script to auto-detect and if applicable, convert numbers. I hope the performance penalty is not too high. It looks like this:

def register(params)
end

def filter(event)
        hash = event.to_hash
        hash.each do |key, value|
                if (not key.start_with? '@') and value.respond_to?(:strip)
                        val = value.strip
                        #puts "looking at ev \"#{key}\", value \"#{value}\""
                        if val =~ /^\d+$/
                                #puts "converting to int"
                                event.set(key, val.to_i)
                        elsif val =~ /^\d+\.\d+$/
                                #puts "converting to float"
                                event.set(key, val.to_f)
                        end
                end
        end

        return [event]
end

If the style is a bit crude, that's because I just started to learn Ruby 2 weeks ago :slight_smile:

Anyway, in pipelines where I expect numbers that I want to use in analysis, like the CSV one I mentioned (procmon), I simply instantiate the filter using the ruby filter plugin and I get my numbers. At least for small sets of sample data this seems to work quite well.

Now I'll have to learn about index templates :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.