Unexpected default mapping


(Khoa Nguyen) #1

Using logstash, I have parsed a CSV file and loaded them to elasticsearch. It looks like all numbers are treated as string. Is there a way to tell elasticsearch to treat them as number without defining a custom mapping? My CSV record has hundreds of fields so I try to avoid the tedious and error-prone task of writing a custom mapping.


(Harlin) #2

Number should be treated as Longs in Elasticsearch by default. At least this is the behavior I have always seen. How are you passing the documents? Are you maybe adding quotes around the numbers where there shouldn't be any?


(Mark Walkom) #3

Quoted numbers will be treated as strings :frowning:
It's always best to use Logstash with convert and explicitly set them as ints.


(Khoa Nguyen) #4

There are no quotes in the CSV file. Below is my logstash configuration. Fields like spkgOctets, duration, etc. are plain numbers.

input {
    stdin {}
}

filter {
    if [message] =~ /.+XDR_ACCT_SUMMARY.+/ {
            csv {
                    columns => [header, recordTime, UTC, sessionId, col5,
                                imsi, msisdn, xdrType, apn, operatorId,
                                roamingGroup, subsIp, sgsnIp, ggsnIp, 
                                spkgId, spkgStartTime, spkgTier, spkgCredits,
                                spkgOctets, duration, timeStamp, technology, col23,
                                carrierId, sessStartTime, sessStopTime, cycleDay,
                                bcsd, sessCredits, sessOctets, probeType,
                                homeMcc, homeMnc, currencyCode, bundleGroup,
                                bundleId, bundleVol1, bundleCurrency1,
                                bundleVol2, bundleCurrency2, 
                                bundleVol3, bundleCurrency3, 
                                bundleVol4, bundleCurrency4, 
                                isProrate, padUsage, extnData, probeData
                                ]
                    separator => "&"
                    remove_field => [ "col5", "col23", "column49" ]
            }
            # Use session stop time as timestamp for this event
            date {
                    match => ["sessStopTime", "yyyyMMddHHmmss"]
                    timezone => "UTC"
            }

            fingerprint {
                    method => "SHA1"
                    key => message
            }

            geoip {
                    source => "sgsnIp"
            }
    }
}

output {
    stdout { codec => dots}
    elasticsearch { 
            host => localhost 
            cluster => "rm_cluster_dev"
            document_id => "%{fingerprint}"
    }
}

(Christian Dahlqvist) #5

The CSV file parses all fields as strings and do not perform any automatic type conversions, so you may need to convert the integer fields explicitly using the mutate filter.


(system) #6