Probem of restitution with a "-"


(pitt) #1

Hello,

When I import data in ElasticSearch , I have a problem on a string that contains "-".
Ex:
TEST-1
this value (type is string in Indices) appears in Discover "TEST-1" , but when I want to make a visualize table, I split rows on Terms, and the string that appears is just TEST instead of TEST-1 .

Do you know how can I correct this problem ?


(Joe Fleming) #2

The problem is that the field is being analyzed. That is, Elasticsearch is analyzing the field and breaking it apart, which also filtering specific keywords.

You need to define the mapping for that field as not analyzed and reindex your data.

You can read more about the string mapping type here.


(pitt) #3

Is it possible to force to not analysed a field with logstash ?

In my configuration, it is logstash that create the indice structure.


(Joe Fleming) #4

Logstash actually already does this using dynamic templates. Any string field that gets indexed also gets indexed in a non-analyzed way in a .raw field.

For example, if you are indexing text into a field called hostname, then you can query hostname.raw to get the non-analyzed version of that text. No re-indexing is required, this is just something Logstash does by default, so it's already in your data.

If you don't ever want to store the analyzed version, that's almost certainly something Logstash can do, but I really don't know Logstash and I can't tell you how to do it. It will most likely involve defining your own mapping, at least for the field(s) you care about.


(pitt) #5

I don't have raw fiels in my indice, so I can't draw raw fields

This is my logstash configuration:

input {
    file {
        path => "E:/TDB/Datasource/Arch_Output/arch_centera_capa_pool*.csv"
	type => "core2"
        start_position => beginning
	sincedb_path => "E:/TDB/Datasource/Arch_Sincedb/mco_arch_01_centera_capa_pool.db"
    }
}

filter {
	csv { 
		columns => [ 
			"DATE",
			"CENTERA",
			"NAME",
			"SITE",
			"BATIMENT",
			"ROLE",
			"MONDE",
			"POOL",
			"IDPOOL",
			"QUOTAA",
			"QUOTAHS",
			"USED",
			"CCLIPS",
			"FILES",
			"USED%QUOTAA",
			"USED%QUOTAHS"
			
		]
		separator => ";"
	}
	if ([DATE] == "#SKIP") {
		drop { }
	}
	mutate {
		convert => ["FILES", "float"]
		convert => ["CCLIPS", "float"]
		convert => ["USED", "float"]
		convert => ["QUOTAHS", "float"]
		convert => ["QUOTAA", "float"]	
		convert => ["USED%QUOTAA", "float"]	
		convert => ["USED%QUOTAHS", "float"]	
		convert => ["IDPOOL", "string"]		
	}
   	date {
 		match => [ "DATE" , "yyyy-MM-dd HH:mm:ss" ]
	}
}

output {
	stdout {}
	elasticsearch { 
		document_id => "%{@timestamp}_%{CENTERA}_%{SITE}_%{BATIMENT}_%{MONDE}_%{POOL}"
		hosts => ["192.168.0.1:9200"] 
		action => "index"
		index => "index_arch_centera_capa_pool"
	}
}

(Joe Fleming) #6

Do you have a mapping template set up on the index_arch_centera_capa_pool? The dynamic template that Logstash creates should happen by default, but I believe if you have your own template defined, then the dynamic template isn't applied.

If you do have your own template defined, then you'll need to add another field that's a non-analyzed version of the field you want to use in Kibana, or set the original field to not_analyzed if you don't ever care to analyze the contents of that field.

Bottom line, that data needs to be indexed into a not_analyzed field in order for it to not split that value on the "-".


(system) #7