Using custom analyzer / tokenizers to breakdown a string into subfields

feicipet · August 25, 2016, 10:43am

Hi,

I am getting a CSV report of PC specifications which I'm using logstash to import into ES.

One of the fields in the report that I receive comes in the following format:

C: (Used '14.51'GB of '80.01'GB , '18.13'%), D: (Used '42.42'GB of '385.75'GB , '11'%)

The number of drives is dynamic, depending on number of drives in the user's PC.

Instead of just storing this string, I would like to be able to store it in the following format:

"disks": [
	{
		"drivename": "C",
		"used": "14.51",
		"size": "80.01",
		"remaining": "65.5" #needs to be calculated
		"percentage": "18.13"
	},
	{
		"drivename": "D",
		"used": "42.42",
		"size": "385.75",
		"remaining": "343.33" #needs to be calculated
		"percentage": "11"
	}
]

I think it can be done by defining the disks field to use a custom analyzer and using a pattern tokenizer to breakdown the analyzer but the examples I find in the documentation is too simple for me to make much headway. Is there a more complex example that splits a string into multiple fields where I can refer to?

If anyone thinks that there's a better way to do this, I'm all ears too.

Thank you!
Wong

ywelsch · August 25, 2016, 11:08am

As you are using Logstash to do the import, the simplest way is to do the conversion into the desired format there:

https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

Topic		Replies	Views
How to Make a field analyzed without spliting it? Elasticsearch	6	1363	July 5, 2017
Help with mapping Elasticsearch	2	240	July 6, 2017
Split row data to fields Logstash	7	2140	July 6, 2017
Probem of restitution with a "-" Kibana	6	849	July 6, 2017
Tokenizer splits field values Elasticsearch	2	535	July 6, 2017

Using custom analyzer / tokenizers to breakdown a string into subfields

Related topics