Elastic Dissect

oleksiiorel · April 2, 2023, 9:39am

Hi everyone
How to split a single cell with a different number of strings in CVS file into separate cells (fields).
Assign the names of the fields from the string itself. If I manually write the names of the fields and their number(like in the example below), it will work only if all strings are contained in each cell but in reality, the number of strings in a cell is different.
example

Data cvs.file

ID,Name,Category
1,Desk Chair,"Furniture,Hard furnishings,categoryE7E7163,Hendi,"
2,Sofa,"Furniture,Soft furnishings, Bartscher"

input{
  file {
	path => "E:/test.csv"  
    start_position => "beginning"
    sincedb_path => "NULL"
  }	  
}

filter {
  dissect {
    mapping => {
      "message" => '%{product_id},%{name},"%{category},%{sub_category},%{alt_category},%{brand}"'
      # it's okay for id=1 (Desk Chair) but not right for id=2 (Sofa)
    }
  }
}
output{
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "test_products"
	user => "elastic"
    password => "**" 
  }
stdout{}
}

What I need to be stored in ELK

"_source": {
"ID": 1,
"Name": "Desk Chair",
"Category": "Furniture",
"Hard furnishings": "Hard furnishings",
"categoryE7E7163": "categoryE7E7163",
"Hendi": "Hendi",
}

"_source": {
"ID": 2,
"Name": "Sofa",
"Category": "Furniture",
"Soft furnishings": "Soft furnishings",
"categoryE7E7163": "categoryE7E7163",
"Bartscher": "Bartscher",
}

dadoonet · April 2, 2023, 10:14am

What do you mean by:

it's okay for id=1 (Desk Chair) but not right for id=2 (Sofa)

What does this produce?

BTW I'm moving your question to Logstash

oleksiiorel · April 2, 2023, 10:52am

Hi, David!
At the moment it produces all "_dissectfailure" strings
Let me try to rephrase.
I need to write a universal entry in 'message' which would separate strings into separate fields regardless of their number. The names of the fields must match the value of the string in the cell. Here is the working code:
Data

ID,Name,Category
1,Sofa,"Furniture,Soft furnishings,TestString"
2,Desk Chair,"Furniture,Hard furnishings,TestString2"

input{
  file {
	path => "E:/test.csv"  
    start_position => "beginning"
    sincedb_path => "NULL"
  }	  
}

filter {
  dissect {
    mapping => {
      "message" => '%{product_id},%{name},"%{category},%{sub_category},%{alt_category}"'
    }
  }
}
output{
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "test_products5"
	user => "elastic"
    password => "**" 
  }
stdout{}
}

The problem is that the Category for each row can be only 3 strings or 6 strings and the names are different. So I decided that it's better to take the field names from the string itself
In the example it works because in the cvs file I put the same number of string in the Category, but in the real database there is a different number of string for each row

Real Data Example

ID,Name,Category
1,Sofa,"1_brands,1_series,1_smallwares,1_storage_and_transport,2_brand_contacto,2_food_storage_supplies,2_food_storage_supplies_2,2_series_501_1000,3_gastronorm_containers,3_gastronorm_containers_2,3_series_contacto_gn_container_melamin,categoryE7E7163"
2,Desk Chair,"1_brands,1_series,1_smallchair"

I found ''. dot key solution'' which might help me but don't know how to implement it. It should look like this I guess:

filter {
  dissect {
    mapping => {
      "message" => '%{product_id},%{name},"%{.},%{.},%{.}%{.},%{.},%{.}"'
    }
  }

but it doesn't work also I don't know how to deal with different numbers of Category strings
To put it simply, I need to somehow save the data from the Category into separate json fields
}

Badger · April 2, 2023, 3:13pm

I would not use dissect at all try

    csv { autodetect_column_names => true }
    ruby {
        code => '
            c = event.get("Category")
            if c
                a = c.split(",")
                event.set("Category", a.shift)
                a.each { |x| event.set(x.strip, x.strip) }
            end
        '
    }

you will need pipeline.ordered to evaluate to true and pipeline.workers to be 1.

In your file input sincedb_path => "NULL" will result in the in-memory sincedb being persisted across restarts in a file called NULL in logstash's working directory. If you do not want it persisted then use sincedb_path => "NUL".

oleksiiorel · April 2, 2023, 3:57pm

Works perfectly! Thank you very much!

oleksiiorel · April 3, 2023, 8:25am

Hi man!
Do you have any idea why when I put your code only 6873 objects are imported into elastic instead of 53k. When I remove the ruby code, all the elements could be stored. I don't get any errors.

Data

SKU;Kategoria;Rodzina;Marka;Kod u dostawcy;Kod u dostawcy 2;Nazwa towaru
00000290;1_brands,1_restaurant_equipment,2_brand_rm_gastro,2_food_holding_and_warming_equipment,3_steam_heaters_and_buffets,4_bain_marie_heaters,categoryE7E7163;bemary grzewcze;rm gastro;00000290;BMPD 2120;Bemar elektryczny 2-komorowy, GN 1/1, jezdny z kranem spustowym, 1,4 kW;

code

input{
  file {
	path => "C:/db.csv" 
    start_position => "beginning"
    sincedb_path => "NULL"
  }	  
}

filter {
   csv { 
	separator => ";"
        skip_empty_columns => true
	columns => ["SKU","Kategoria","Rodzina","Marka","Kod u dostawcy","Kod u dostawcy2", "Nazwa towaru"]
	}
	
	ruby {
        code => '
            c = event.get("Kategoria")
            if c
                a = c.split(",")
                event.set("Kategoria", a.shift)
                a.each { |x| event.set(x.strip, x.strip) }
            end
        '
    }
}
output{
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "products"
	user => "elastic"
    password => "**" 
  }
stdout{}
}

I replaced  'autodetect_column_names => true' -   because this parameter caused incorrect placement of the columns and their contents

when I comment this part of code I'm able to import all text from csv:

	ruby {
        code => '
            c = event.get("Kategoria")
            if c
                a = c.split(",")
                event.set("Kategoria", a.shift)
                a.each { |x| event.set(x.strip, x.strip) }
            end
        '
    }

system · May 1, 2023, 8:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Filter Logstash	9	279	April 29, 2023
Split a specific csv column into multiple fields Logstash	3	474	December 8, 2021
Split csv column to several fields Logstash	16	5984	July 6, 2017
Logstash - how to configure CSV filter for joining 2 CSV files based on a common field and send the joined log to ES Logstash	29	3212	March 28, 2019
Logstash CSV Parse Error Logstash	1	239	May 6, 2020

Elastic Dissect

Related topics