Elastic Dissect

Hi everyone :slight_smile:
How to split a single cell with a different number of strings in CVS file into separate cells (fields).
Assign the names of the fields from the string itself. If I manually write the names of the fields and their number(like in the example below), it will work only if all strings are contained in each cell but in reality, the number of strings in a cell is different.
example

Data cvs.file

ID,Name,Category
1,Desk Chair,"Furniture,Hard furnishings,categoryE7E7163,Hendi,"
2,Sofa,"Furniture,Soft furnishings, Bartscher"

input{
  file {
	path => "E:/test.csv"  
    start_position => "beginning"
    sincedb_path => "NULL"
  }	  
}

filter {
  dissect {
    mapping => {
      "message" => '%{product_id},%{name},"%{category},%{sub_category},%{alt_category},%{brand}"'
      # it's okay for id=1 (Desk Chair) but not right for id=2 (Sofa)
    }
  }
}
output{
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "test_products"
	user => "elastic"
    password => "**" 
  }
stdout{}
}

What I need to be stored in ELK

"_source": {
"ID": 1,
"Name": "Desk Chair",
"Category": "Furniture",
"Hard furnishings": "Hard furnishings",
"categoryE7E7163": "categoryE7E7163",
"Hendi": "Hendi",
}

"_source": {
"ID": 2,
"Name": "Sofa",
"Category": "Furniture",
"Soft furnishings": "Soft furnishings",
"categoryE7E7163": "categoryE7E7163",
"Bartscher": "Bartscher",
}

What do you mean by:

it's okay for id=1 (Desk Chair) but not right for id=2 (Sofa)

What does this produce?

BTW I'm moving your question to Logstash

Hi, David!
At the moment it produces all "_dissectfailure" strings
Let me try to rephrase.
I need to write a universal entry in 'message' which would separate strings into separate fields regardless of their number. The names of the fields must match the value of the string in the cell. Here is the working code:
Data

ID,Name,Category
1,Sofa,"Furniture,Soft furnishings,TestString"
2,Desk Chair,"Furniture,Hard furnishings,TestString2"
input{
  file {
	path => "E:/test.csv"  
    start_position => "beginning"
    sincedb_path => "NULL"
  }	  
}

filter {
  dissect {
    mapping => {
      "message" => '%{product_id},%{name},"%{category},%{sub_category},%{alt_category}"'
    }
  }
}
output{
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "test_products5"
	user => "elastic"
    password => "**" 
  }
stdout{}
}

The problem is that the Category for each row can be only 3 strings or 6 strings and the names are different. So I decided that it's better to take the field names from the string itself
In the example it works because in the cvs file I put the same number of string in the Category, but in the real database there is a different number of string for each row

Real Data Example

ID,Name,Category
1,Sofa,"1_brands,1_series,1_smallwares,1_storage_and_transport,2_brand_contacto,2_food_storage_supplies,2_food_storage_supplies_2,2_series_501_1000,3_gastronorm_containers,3_gastronorm_containers_2,3_series_contacto_gn_container_melamin,categoryE7E7163"
2,Desk Chair,"1_brands,1_series,1_smallchair"

I found ''. dot key solution'' which might help me but don't know how to implement it. It should look like this I guess:

filter {
  dissect {
    mapping => {
      "message" => '%{product_id},%{name},"%{.},%{.},%{.}%{.},%{.},%{.}"'
    }
  }

but it doesn't work also I don't know how to deal with different numbers of Category strings
To put it simply, I need to somehow save the data from the Category into separate json fields
}

I would not use dissect at all try

    csv { autodetect_column_names => true }
    ruby {
        code => '
            c = event.get("Category")
            if c
                a = c.split(",")
                event.set("Category", a.shift)
                a.each { |x| event.set(x.strip, x.strip) }
            end
        '
    }

you will need pipeline.ordered to evaluate to true and pipeline.workers to be 1.

In your file input sincedb_path => "NULL" will result in the in-memory sincedb being persisted across restarts in a file called NULL in logstash's working directory. If you do not want it persisted then use sincedb_path => "NUL".

2 Likes

Works perfectly! Thank you very much!

Hi man!
Do you have any idea why when I put your code only 6873 objects are imported into elastic instead of 53k. When I remove the ruby code, all the elements could be stored. I don't get any errors.

Data

SKU;Kategoria;Rodzina;Marka;Kod u dostawcy;Kod u dostawcy 2;Nazwa towaru
00000290;1_brands,1_restaurant_equipment,2_brand_rm_gastro,2_food_holding_and_warming_equipment,3_steam_heaters_and_buffets,4_bain_marie_heaters,categoryE7E7163;bemary grzewcze;rm gastro;00000290;BMPD 2120;Bemar elektryczny 2-komorowy, GN 1/1, jezdny z kranem spustowym, 1,4 kW;

code

input{
  file {
	path => "C:/db.csv" 
    start_position => "beginning"
    sincedb_path => "NULL"
  }	  
}

filter {
   csv { 
	separator => ";"
        skip_empty_columns => true
	columns => ["SKU","Kategoria","Rodzina","Marka","Kod u dostawcy","Kod u dostawcy2", "Nazwa towaru"]
	}
	
	ruby {
        code => '
            c = event.get("Kategoria")
            if c
                a = c.split(",")
                event.set("Kategoria", a.shift)
                a.each { |x| event.set(x.strip, x.strip) }
            end
        '
    }
}
output{
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "products"
	user => "elastic"
    password => "**" 
  }
stdout{}
}

I replaced  'autodetect_column_names => true' -   because this parameter caused incorrect placement of the columns and their contents

when I comment this part of code I'm able to import all text from csv:

	ruby {
        code => '
            c = event.get("Kategoria")
            if c
                a = c.split(",")
                event.set("Kategoria", a.shift)
                a.each { |x| event.set(x.strip, x.strip) }
            end
        '
    }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.