Debugging "Failed Documents" from Data Visualizer CSV Ingestion

wpm · August 28, 2022, 9:39pm

I have a CSV file I'm trying to convert into an index via Kibana.

I go to Machine Learning/Data Visualizer and upload my file. The "explanations" look good to me, as do all the guesses for the column types, so I hit "import". About 1,500 of 3,000 CSV lines could not be imported. The message says "This could be due to lines not matching the Grok pattern." and beneath that are the details of the error.

The data is proprietary, so I can't share it. But say that there is a "Notes" column that Kibana correctly determines to be of type text. The problematic lines are all in that column and of the form "Lorem ipsum; 99-9999, sit amet". The error message for each is "unable to convert [99-9999] to long".

My specific question is why is Kibana trying to convert this string to a numeric type when it appears in a text field? My more general question is how do I go about debugging this?

I thought I could look at the Pipeline that gets generated and find some error to fix, but that just appears to import the "Notes" column as text. It's not doing anything wrong.

stephenb · August 29, 2022, 12:07am

What version are you on?

Is that a single column or multiple columns?

Are all the fields " quote delimited?

What IS the delimeter?

Perhaps you may have inconsistent quoting or delimeters etc.

Give us more of the "Shape" of the data? What does a couple columns / rows look like.

And you can change all this by going to the Advanced tab

This parsed fine for me Version 8.4 except I changed the ID to an integer it originally had it as a keyword

 id,notes,code
 1234,"Lorem ipsum; 99-9999, sit amet","Code"
 1234,"Lorem ipsum; 99-9999, sit amet","Code"
 1234,"Lorem ipsum; 99-9999, sit amet","Code"

wpm · August 29, 2022, 1:10am

Version 8.4.

That field is a single column. In the CSV, the columns to its left and right look like this.

999.9999,"Lorem ipsum; 99-9999, sit amet",x

I was thinking this might be because there's a comma in the middle of the text, but in the CSV the whole cell is double-quoted and the advanced box in the Kibana UI had " as the quote delimiter.

stephenb · August 29, 2022, 1:52am

I would look through your data. I suspect there are columns with or without the quotes. Or it's not that column at all. And one of your columns with a number in it sometimes has a number and sometimes does not. That could be another explanation.

You can still, import it, and then check the rows etc.

Everything it's doing is right in that advanced tab.

wpm · August 29, 2022, 2:32pm

It seems like the best thing to do is start with a very small data set and write my own pipeline.

I authored a pipeline that just works on these three columns. Now I want to send one line of my CSV file through it? How do I do that?

I don't see how to tell the data file visualizer to use my custom pipeline. (I don't think that's its purpose.) I see documentation online like this blog post that describes how to convert your CSV into JSON documents outside of Kibana and then index them in the usual manner.

Is there an out-of-the-box way to ingest CSVs? Should I be tinkering with data file visualizer? Or should I just write my own CSV-to-JSON conversion script?

stephenb · August 29, 2022, 2:38pm

What kind of pipeline and ingest pipeline?

Can you show me?

You just cut-n-paste into the advanced tab

You could also create your own pipeline and use filebeat... if you plan to repeat this many times

wpm · August 29, 2022, 2:50pm

Reading online a bit more, is this ingestion stuff what Logstash is for?

Let me back up...I know how no-SQL databases work and a few years ago I played with the development UI in Kibana, so I understand the core of the product. But now I'm trying to actually do something with it, so I need a better overview.

I'm a data scientist. I work with lots of different data sets. A lot of times they come in as CSV files. I need an easy way to ingest and analyze that data.

What I want to do is a demo for my colleagues where I open up some piece of software, load in a CSV file, push a few buttons, make a few pretty graphs, and then say, "See how easy this is? Now can we please stop writing our own dashboard software and concentrate on the things we're good at?"

Ideally I'd like to sit quietly in the back of a planning meeting where people are discussing how we're going to spend months writing visualizations and then have the visualizations done by the end of the meeting.

I think this is possible with Elasticsearch/Kibana, but I have to figure out which parts of the system to completely master in order to do it.

wpm · August 29, 2022, 2:59pm

I suspect creating my own pipeline is the way to go.

stephenb · August 29, 2022, 3:10pm

Awesome Totally!

There are Logstash Pipelines (works great) that runs within logstash.... that take a little setup work, powerful but takes a little coding etc..

There are ingest pipelines that run inside Elasticsearch (What Data Visualizer uses) that can be called from from the REST API or Filebeat or Data Visualizer...

If you want to do that above I would write and ingest pipeline... then just cut-n-paste. into the data visualizer or use filebeat... basically you would just you a basic setup and then just set the pipeline in the output section.

wpm · August 29, 2022, 3:19pm

Thanks. It helps to have the names of things to go look at tutorials of. Would you say that Visualizer is the out-of-box solution and Filebeat is the next step up for customization?

stephenb · August 29, 2022, 3:22pm

Yes... I think you ran into something a bit "wonky" (tech term )
Visualizer for simple files usually works pretty good...

Writing an ingest pipeline can be pretty quick dev cycle... look up the _simulate API...
You use that with some sample docs and you can dev-test cycle very quick.

system · September 26, 2022, 3:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Type Mismatch between Ingest Pipeline and Kibana Elasticsearch	9	1627	March 21, 2018
Data Type issue from Data Visualizer Import Data in Kibana Kibana	4	1815	June 10, 2020
Direct CSV import fro kibana V5 alpha Kibana	2	926	July 6, 2017
Pb with format during a CSV import Kibana	3	335	February 16, 2023
<CSV::MalformedCSVError: Illegal quoting in line 1.> Logstash	3	4513	February 5, 2019

Debugging "Failed Documents" from Data Visualizer CSV Ingestion

Related topics