Error parsing csv

Hi, I'm reading a CSV file and sending it to Elasticsearch, everything works fine but some rows are throwing an exception:

Error parsing csv {:field=>"message", :source=>"6123464a-420f-4838-8ecb-fcce87f16297;208783762198", :exception=>#<ThreadError: interrupted in FiberQueue.pop>}

I tried googling and also searching in this forum, but I don't find any results with this exception message.
What could be causing this exception? After finishing the data ingestion I can see all the results but those shown in the exception are missing.

The conf is:

input {
  file {
    path => "/usr/share/logstash/files/file.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  csv {
    separator => ";"
    skip_header => true
    skip_empty_rows => true
    columns => [ "id", "field1", "field2" ]
  }

  mutate {
    copy => { "id" => "[@metadata][_id]"}
    strip => [ "field1", "field2" ]
    remove_field => [ "id", "message", "@timestamp", "path", "host", "@version", "event", "log" ]
  }
}

output {
#  stdout { codec => rubydebug { metadata => true } }
  elasticsearch {
    ...
  }
}

What version are you running?

What is a file size?
Have you tried to increase the memory?
In config/jvm.options file, set these values to 4-8 GB and don't use rubydebug as you posted.

-Xms1g
-Xmx1g

@stephenb I'm using 8.4.1, I'll take a look to the link you posted.

I'm running a file with a size of ~500mb, and has ~4.5M rows

@Rios I'm running ELK with docker, and I have set:

For Elastic Search:
ES_JAVA_OPTS: -Xms2G -Xmx2G

For Logstash:
LS_JAVA_OPTS: -Xms1G -Xmx1G

Also, in my Docker desktop resources I've assigned 10GB

and don't use rubydebug as you posted. No, I'm not using it. It's a comment in the file.

EDIT:
I've increased the LS_JAVA_OPTS to 2G in my docker-compose and now is getting worse, I'll paste below part of the breakdown:


[2022-09-10T13:08:23,483][INFO ][logstash.outputs.elasticsearch][my-pipeline] Using a default mapping template {:es_version=>8, :ecs_compatibility=>:v8}
[153.119s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 0k, detached.
[153.122s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 0k, detached.
[153.123s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 0k, detached.
[153.123s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Ruby-0-Fiber-77912"
[153.124s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 0k, detached.
[153.124s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Ruby-0-Fiber-77911"
[153.127s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Ruby-0-Fiber-77913"
[153.127s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Ruby-0-Fiber-77910"
[153.127s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 0k, detached.
[153.127s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Ruby-0-Fiber-77914"
[153.127s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 0k, detached.
[153.128s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Ruby-0-Fiber-77915"

[2022-09-10T13:10:34,649][FATAL][org.logstash.Logstash    ][my-pipeline] uncaught error (in thread [my-pipeline]>worker5)

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.lang.Thread.start0(Native Method) ~[?:?]
at java.lang.Thread.start(java/lang/Thread.java:802) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.addWorker(java/util/concurrent/ThreadPoolExecutor.java:945) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.execute(java/util/concurrent/ThreadPoolExecutor.java:1364) ~[?:?]
at org.jruby.ext.fiber.ThreadFiber.createThread(org/jruby/ext/fiber/ThreadFiber.java:284) ~[jruby.jar:?]
at org.jruby.ext.fiber.ThreadFiber.initialize(org/jruby/ext/fiber/ThreadFiber.java:56) ~[jruby.jar:?]
at uri_3a_classloader_3a_.jruby.kernel.enumerator.reset(uri:classloader:/jruby/kernel/enumerator.rb:111) ~[?:?]
at uri_3a_classloader_3a_.jruby.kernel.enumerator.next(uri:classloader:/jruby/kernel/enumerator.rb:93) ~[?:?]
at uri_3a_classloader_3a_.jruby.kernel.enumerator.next(uri:classloader:/jruby/kernel/enumerator.rb:17) ~[?:?]
at usr.share.logstash.vendor.jruby.lib.ruby.stdlib.csv.invokeOther2:next(usr/share/logstash/vendor/jruby/lib/ruby/stdlib//usr/share/logstash/vendor/jruby/lib/ruby/stdlib/csv.rb:1316) ~[?:?]
at usr.share.logstash.vendor.jruby.lib.ruby.stdlib.csv.shift(/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/csv.rb:1316) ~[?:?]
at usr.share.logstash.vendor.jruby.lib.ruby.stdlib.csv.invokeOther1:shift(usr/share/logstash/vendor/jruby/lib/ruby/stdlib//usr/share/logstash/vendor/jruby/lib/ruby/stdlib/csv.rb:700) ~[?:?]
at usr.share.logstash.vendor.jruby.lib.ruby.stdlib.csv.parse_line(/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/csv.rb:700) ~[?:?]
at usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_filter_minus_csv_minus_3_dot_1_dot_1.lib.logstash.filters.csv.filter(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-csv-3.1.1/lib/logstash/filters/csv.rb:141) ~[?:?]
at usr.share.logstash.logstash_minus_core.lib.logstash.filters.base.do_filter(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:159) ~[?:?]
at usr.share.logstash.logstash_minus_core.lib.logstash.filters.base.multi_filter(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:178) ~[?:?]
at org.jruby.RubyArray.each(org/jruby/RubyArray.java:1865) ~[jruby.jar:?]
at usr.share.logstash.logstash_minus_core.lib.logstash.filters.base.multi_filter(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:175) ~[?:?]
at org.logstash.config.ir.compiler.FilterDelegatorExt.doMultiFilter(org/logstash/config/ir/compiler/FilterDelegatorExt.java:127) ~[logstash-core.jar:?]
at org.logstash.config.ir.compiler.AbstractFilterDelegatorExt.multi_filter(org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:134) ~[logstash-core.jar:?]
at org.logstash.generated.CompiledDataset1.compute(org/logstash/generated/CompiledDataset1) ~[?:?]
at org.logstash.generated.CompiledDataset2.compute(org/logstash/generated/CompiledDataset2) ~[?:?]
at org.logstash.generated.CompiledDataset3.compute(org/logstash/generated/CompiledDataset3) ~[?:?]
at org.logstash.generated.CompiledDataset4.compute(org/logstash/generated/CompiledDataset4) ~[?:?]
at org.logstash.config.ir.CompiledPipeline$CompiledUnorderedExecution.compute(org/logstash/config/ir/CompiledPipeline.java:347) ~[logstash-core.jar:?]
at org.logstash.config.ir.CompiledPipeline$CompiledUnorderedExecution.compute(org/logstash/config/ir/CompiledPipeline.java:341) ~[logstash-core.jar:?]
at org.logstash.execution.WorkerLoop.run(org/logstash/execution/WorkerLoop.java:87) ~[logstash-core.jar:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(jdk/internal/reflect/NativeMethodAccessorImpl.java:77) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(jdk/internal/reflect/DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:568) ~[?:?]
at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:442) ~[jruby.jar:?]
at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:306) ~[jruby.jar:?]
at usr.share.logstash.logstash_minus_core.lib.logstash.java_pipeline.start_workers(/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:300) ~[?:?]
at org.jruby.RubyProc.call(org/jruby/RubyProc.java:309) ~[jruby.jar:?]
at java.lang.Thread.run(java/lang/Thread.java:833) [?:?]

[2022-09-10T13:10:34,649][FATAL][org.logstash.Logstash    ][my-pipeline] uncaught error (in thread [my-pipeline]>worker3)
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
...

I'm running a file with a size of ~500mb

This will require at least 4-8 GB for LS. Can you try the same file with setting:

LS_JAVA_OPTS: -Xms4G -Xmx4G

Also try to split file on parts 50-100 MB. Is it possible?

And ... since your ls.conf has a simple logic, have you think about Filebeat?
To be honest I haven't tried 500mb size file yet. Without test couldn't clearly recommend FB or LS, but since the logic is simple, would try.

No, I didn't tried Filebeat, I'll take a look.

I've created other index yesterday for other file, with 4.5M rows and I applied some transformations like adding fields searching in other index and I haven't problems. I don't know why this is happening here.

unable to create native thread: possibly out of memory or process/resource limits reached
This is reason. Test with 4, 6, 8 GB...

LS_JAVA_OPTS: -Xms4G -Xmx4G

Did you check the link @stephenb shared? There is a bug in version 8.4.* that is probably the cause of your issue.

There is a github issue already.

You will need to wait for an updated version of Logstash to fix this, but there is an workaround.

From your csv filter you csv file looks pretty simple and you could use dissect to parse it.

If your csv file looks like this:

id;field1;field2

You can use the following dissect filter:

dissect {
    mapping => {
        "message" => "%{id};%{field1};%{field2}"
    }
}
1 Like

Thanks @leandrojmp I'm on it.
I'll try dissect and see how it works for me.
Thanks!

Thanks @leandrojmp this worked and is extremelly fast compared with csv.
I only have to see what's happening with some rows because I'm getting a message:
Dissector mapping, field found in event but it was empty

Thanks!