Is it possible to use comma-separated-list-data as terms in Kibana?

Hi,

I've tried searching, but can't find a way around what seems to be a rather basic usecase, so please send me to a suitable thread if there is one, but if not, here is my problem:

I want to use the ELK stack to visualize the contents of data in xml files, and in particular there are some fields (xml-attributes) that consist of comma-separated lists that I would like to use for categorizing the data.

I have the whole thing set up and the data flows all the way, but I only get those fields into ES as either text ("searchable"), or keyword (aggregateable) field, so either I can search on the parts of the comma-separated list (not really what I need - I want terms in kibana not filters) or I can aggregate on the whole list and not the indivudual comma-separated parts, which isn't what I'm after either.

I have created a custom comma-separation-analyzer, but that seems to affect the text-part, while for the keyword-part I can't specify an analyzer of course. (There is the normalizer but the docs say that has to result in a unique value per document).

It seems to be possible to split the field into different fields, or different documents, but nether suite this usecase (several documents would make a lot of other statistics a lot more complex, while different fields doesn't allow the same kind if easy filtering in the kibana ui).

What I'm after is, when having a comma-separated list in a field in the input data, containing say between 0 and 100 values that I don't know beforehand (and which values and the numbe of values varies between rows/documents), how can I make those values show up in a kibana "terms"-list, when creating barcharts or datatables (for easy click-filtering)?

Thank you for any advice!

You can index an array of values into a field and the field will show up in the terms list when building a bar chart, but are you looking for the actual values of the array to show up in the terms list?

I created a test index with a values field which contains an array of integer values.


And I can create a bar chart with that field:

I can then filter on the value, but it will filter the documents where that value exists in the array, so you will see the other values in the array as well.

If this is not what you are looking for, can you provide a bit more explanation?

Thank you!

I think I tried this but must have made a mistake in logstash causing the the field not to appear or be treated properly (as aggregatable) in ES/kibana...I hadn't thought about the filter behavior but I understand that would be the consequence when they are in the same document(s), but I think that is acceptable since the main usecase would be to be used as a quick way to filter out documents containing one/a few of the values (perhaps using a list instead of barchart).

But I still can't seem to make it work, but as you have shown it must be an error on my side. The input data is a simple xml attribute looking like

<event ... attributeX="value1, value2, value3" ... />  

What I do is, first in logstash use the xml-plugin with a line:

xpath => ["//Event/@attributeX", "attributeX"] 

(gets the raw attrib string properly, and also works for other attributes)

and later on

mutate {
    split => { "attributeX" => "," }
}

which seems to work for others when creating arrays when looking around on the web. In the ES mapping/template I've tried both a plain

"attributeX": {
	"type": "keyword"				
},

and also

"attributeX": {
	"type": "text",
	"analyzer": "comma_separation_analyzer",
	"term_vector" : "yes",
	"fields": {
		"keyword": {
			"type": "keyword"
		}
	}					
},

(where I've created the analyzer and tokenizer...).

But still no luck; I only get the whole "value1, value2, value3"-strings when using the field as a terms aggregation in kibana, probably since it still doesn't get indexed properly. When running a curl query towards ES for a document, the response for the field still looks like:

"attributeX" : [
	"value1, value2, value3"
],

(and not, what I guess should be the "correct" representation: "attributeX" : ["value1"], ["value2], ["value3"])

Any ideas what I can do when indexing (or defining the mapping) to index it as an array? (ELK stack version 5.4.2)
(Also I guess I'll try to move this to the Logstash tag)

I suggest asking a new question in the logstash channel specific to the array question. Thanks!

Thanks, I changed the thread tag, but now I've found the problem:

Using the XML-filter the output is an array for all fields (there is a force_array=>false option which seems to do no difference), so when I tried to use mutate-split, it didn't do anything since the data was an array of one element already...so if first running join (with no join-char), and then split works. A bit ugly but for now it's a proof of concept to figure out throughput, data sizes etc, for the pipeline and not production so it will suffice.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.