Elastic Filter

I import a csv file via logstash "filter cvs" into Elasticsearch. One of the cells in a table (CVS file) contains several strings example: (categoty, subcategory, sub_subcategory). I would like to split these strings into separate cells (fields).
On the forums people say that I need to use "add_field filter", but how exactly I do not know.

Example:
ID|Name| Category \column names in cvs
1, Sofa, "Furniture, Soft furnishings, TestString" \row in CVS

What I need to be stored in ELK
"_source": {
"ID": 1,
"Name": Sofa,
"Category": Furniture,
"SubCategory": "Soft furnishings",
"SubSubCategory": "TestString",
}
What I did :
input{
file {
path => "E:/ELK/db/db.csv"
start_position => "beginning"
sincedb_path => "NULL"
}
}

filter{

csv {
separator => ";"
columns => ["Id","Name","Category"]
}
}
output{
elasticsearch {
hosts => "http://localhost:9200"
index => "products"
user => "elastic"
password => "**"
}
stdout{}
}

Hi @oleksiiorel Welcome to the community.

Perhaps dissect is a better choice, it has more control and is also more performant

Can u please show an example of its use in my case? I could not figure out the dissect I tried something like this
filter{
if[message] =~"Category,"{
dissect {
mapping => {
"message" => "%{Category} %{Subcategory} %{SubSubCategory}"
}
}
csv {
separator => ";"
columns => ["Id","Name","Category"]
}
}
output{
elasticsearch {
hosts => "http://localhost:9200"
index => "product"
user => "elastic"
password => "**"
}
stdout{}
}

But I must be doing something wrong... I'm new in ELK

dissect is literal you need to put in spaces and quotes

Data

ID|Name|Category
1, Sofa, "Furniture, Soft furnishings, TestString"
2, Desk Chair, "Furniture, Hard furnishings, OtheString"

Code

input {
  file { 
    path => "/Users/sbrown/workspace/sample-data/discuss/dissect/discuss.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null" 
  }
}

filter {
  dissect {
    mapping => {
      "message" => '%{product_id}, %{name}, "%{category}, %{sub_category}, %{alt_category}"'
      # 1, Sofa, "Furniture, Soft furnishings, TestString"
    }
  }
}

output { 
  stdout {} 
}

Output note it failed on the first line which is OK

{
         "event" => {
        "original" => "ID|Name|Category"
    },
          "host" => {
        "name" => "hyperion"
    },
      "@version" => "1",
          "tags" => [
        [0] "_dissectfailure"
    ],
       "message" => "ID|Name|Category",
           "log" => {
        "file" => {
            "path" => "/Users/sbrown/workspace/sample-data/discuss/dissect/discuss.csv"
        }
    },
    "@timestamp" => 2023-04-01T15:23:34.416033Z
}
{
            "host" => {
        "name" => "hyperion"
    },
    "sub_category" => "Soft furnishings",
             "log" => {
        "file" => {
            "path" => "/Users/sbrown/workspace/sample-data/discuss/dissect/discuss.csv"
        }
    },
            "name" => "Sofa",
      "product_id" => "1",
           "event" => {
        "original" => "1, Sofa, \"Furniture, Soft furnishings, TestString\""
    },
        "@version" => "1",
         "message" => "1, Sofa, \"Furniture, Soft furnishings, TestString\"",
        "category" => "Furniture",
      "@timestamp" => 2023-04-01T15:23:34.416294Z,
    "alt_category" => "TestString"
}

Data

ID,Name,Category
1,Sofa,"Furniture,Soft furnishings,TestString"
2,Desk Chair,"Furniture,Hard furnishings,OtheString"
3,Desk Chair,"Furniture,Hard furnishings,OtheString"
      # 1, Sofa, "Furniture, Soft furnishings, TestString" - is this line for test?
I tried this code:
input{
  file {
	path => "E:/test.csv"  
    start_position => "beginning"
    sincedb_path => "NULL"
  }	  
}

filter {
  dissect {
    mapping => {
      "message" => '%{product_id}, %{name}, "%{category}, %{sub_category}, %{alt_category}"'
    }
  }
}
output{
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "test_products"
	user => "elastic"
    password => "**" 
  }
stdout{}
}

Then copied yours:
input{
  file {
	path => "E:/test.csv"  
    start_position => "beginning"
    sincedb_path => "NULL"
  }	  
}
filter {
  dissect {
    mapping => {
      "message" => '%{product_id}, %{name}, "%{category}, %{sub_category}, %{alt_category}"'
      # 1, Sofa, "Furniture, Soft furnishings, TestString"
    }
  }
}

output{
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "test_products2"
	user => "elastic"
    password => "*" 
  }
stdout{}
}


but the result is the same
 "_index": "test_products",
        "_id": "IcWTPYch3_",
        "_score": 1,
        "_source": {
          "host": {
            "name": "DESKTOP-5QFLCFM"
          },
          "message": """1,Sofa,"Furniture,Soft furnishings,TestString"
""",
          "event": {
            "original": """1,Sofa,"Furniture,Soft furnishings,TestString"
"""
          },
          "@version": "1",
          "@timestamp": "2023-04-01T16:07:58.384915200Z",
          "tags": [
            "_dissectfailure"
          ],
          "log": {
            "file": {
              "path": "C:/test.csv"
            }
          }
        }
      },
      {
        "_index": "test_products",
        "_id": "IsWTPYcBFY9QtlnpVh3_",
        "_score": 1,
        "_source": {
          "host": {
            "name": "DESKTOP-5QFLCFM"
          },
          "message": """2,Desk Chair,"Furniture,Hard furnishings,OtheString"
""",
          "event": {
            "original": """2,Desk Chair,"Furniture,Hard furnishings,OtheString"
"""
          },
          "@version": "1",
          "@timestamp": "2023-04-01T16:07:58.387915800Z",
          "tags": [
            "_dissectfailure"
          ],
          "log": {
            "file": {
              "path": "C:/test.csv"
            }
          }
        }
      }
    ]
 

[/quote]
Some kind of magic, it works for you and not for me. Any idea what the reason?
the main reason is naturally me=)

dissect is literal as I said so if your data looks like below with no spaces between the commas...

ID,Name,Category
1,Sofa,"Furniture,Soft furnishings,TestString"
2,Desk Chair,"Furniture,Hard furnishings,OtheString"
3,Desk Chair,"Furniture,Hard furnishings,OtheString"

then your dissect should look like with no spaces between the commas...

"message" => '%{product_id},%{name},"%{category},%{sub_category},%{alt_category}"'

Which just worked fine for me..

3 Likes

The dissect pattern need to exactly match the data. Your sample data does not have spaces after the commas which the data Stephen is using does. Either remove the spaces from you dissect pattern or fix the input data.

2 Likes

Mr. Stephen, Thank you for your forbearance! You made my day beautiful:)

1 Like

Thank u so much!!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.