Handling dot notated fields


(Fen) #1

Running Elastic 6.0.0.

What's the best way to handle fields which have a dot notation within them? The default behaviour seems to be to store them as strings?

An example

event.status = "Running"
event.status.id = 33

Will be stored as a string and int.

As it stands right now I can't get events which contain the above fields to index without explicitly converting them to objects in logstash using mutate

mutate {
   rename => {
      "event.status" => "[event][status]"
      "event.status.id" => "[event][status][id]"
   }
}

This is an issue as the logstash configs have to be in line with any data that's being sent. What I'd rather do is treat all fields with a dot in as an object by default? (Unless someone can think of a pitfall!) and the only way I can think of maybe doing that would be with some ruby?

I also don't think a mapping would be the best in ES unless it was dynamic and could handle changes on the fly.

Looking for a little guidance here, any help appreciated.


(Val Crettaz) #2

You should check the de_dot filter plugin as it should do exactly what you need.

You can read more information when it came out here: https://www.elastic.co/blog/introducing-the-de_dot-filter.

In your concrete example, you'll still have an issue, though. event.status cannot be a string if event.status.id is an int. As event.status needs to be an object, you'll need another field (e.g.) event.status.state to store the "Running" string.


(Fen) #3

Hey, thanks for the response

I did check the de_dot plugin which is deprecated and planned to be removed in 7.0 according to github. I'm not actually wanting to remove the dots either, but the logic could be applied it just means the json docs won't be nested in elastic.

And yes, you're totally right about the issue in the example. In fact that may have been the original issue which made me query this.


(Fen) #4

After some digging on my side into this a bit more I'm left more confused than when I first started.

I've had a look at the raw json being sent in, the fields have dots in. What I don't understand is at what point this is being converted to an object.

In my logstash I have other fields being converted using the mutate filter posted earlier, but nothing relating to this.

Elastic exception;

org.elasticsearch.index.mapper.MapperParsingException: object mapping for [event.status] tried to parse field [event.status] as object, but found a concrete value
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:349) ~[elasticsearch-6.0.0.jar:6.0.0]
	at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:470) ~[elasticsearch-6.0.0.jar:6.0.0]
	at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:597) ~[elasticsearch-6.0.0.jar:6.0.0]
	at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:395) ~[elasticsearch-6.0.0.jar:6.0.0]

Sample data;

{
  "event.status": "All done yo",
  "event.id": 777777666,
  "event.status.id": 123,
  "event.status.log.id: 1231233,
  "app.name": "My.Awesome.Application",
  "app.version": "0.15.0.0"
}

And the mapping that was created in elastic:

      "event": {
        "properties": {
          "id": {
            "type": "long"
          },
          "status": {
            "properties": {
              "id": {
                "type": "long"
              },
              "log": {
                "properties": {
                  "id": {
                    "type": "long"
                  }
                }
              }
            }
          }
        }

If you were to say that if dot fields which start the same will be turned into an object then I'd understand but I've not seen that stated anywhere.

Would I be better off just replacing all the dots with - or _? This would suck though where we get sent actual nested json as that is mapped out to fields which are dot notated and then I'd be left with a mix of field naming conventions.


(Fen) #5

I believe I've just come up against this again but it seems to be a bit more nasty? Ran on two different versions of logstash 6.0.0 and 6.2.4.

This time logstash is throwing the following error

[2018-05-10T11:22:21,331][ERROR][logstash.pipeline ] Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash. {:pipeline_id=>"main", "exception"=>"org.jruby.RubyString cannot be cast to org.logstash.ConvertedList", "backtrace"=>["org.logstash.Accessors.setChild(Accessors.java:107)", "org.logstash.Accessors.set(Accessors.java:16)", "org.logstash.Event.setField(Event.java:163)", "org.logstash.ext.JrubyEventExtLibrary$RubyEvent.ruby_set_field(JrubyEventExtLibrary.java:95)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_mutate_minus_3_dot_3_dot_1.lib.logstash.filters.mutate.RUBY$block$rename$1(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-filter-mutate-3.3.1/lib/logstash/filters/mutate.rb:281)", "org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:156)", "org.jruby.runtime.BlockBody.yield(BlockBody.java:114)", "org.jruby.runtime.Block.yield(Block.java:165)", "org.jruby.RubyHash$12.visit(RubyHash.java:1362)", "org.jruby.RubyHash$12.visit(RubyHash.java:1359)", "org.jruby.RubyHash.visitLimited(RubyHash.java:662)", "org.jruby.RubyHash.visitAll(RubyHash.java:647)", "org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1319)", "org.jruby.RubyHash.each_pairCommon(RubyHash.java:1354)", "org.jruby.RubyHash.each(RubyHash.java:1343)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_mutate_minus_3_dot_3_dot_1.lib.logstash.filters.mutate.RUBY$method$rename$0(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-filter-mutate-3.3.1/lib/logstash/filters/mutate.rb:277)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_filter_minus_mutate_minus_3_dot_3_dot_1.lib.logstash.filters.mutate.RUBY$method$filter$0(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-filter-mutate-3.3.1/lib/logstash/filters/mutate.rb:249)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.logstash_minus_core.lib.logstash.filters.base.RUBY$method$do_filter$0(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/logstash-core/lib/logstash/filters/base.rb:145)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.logstash_minus_core.lib.logstash.filters.base.RUBY$block$multi_filter$1(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/logstash-core/lib/logstash/filters/base.rb:164)", "org.jruby.runtime.CompiledIRBlockBody.yieldDirect(CompiledIRBlockBody.java:156)", "org.jruby.runtime.BlockBody.yield(BlockBody.java:114)", "org.jruby.runtime.Block.yield(Block.java:165)", "org.jruby.RubyArray.each(RubyArray.java:1734)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.logstash_minus_core.lib.logstash.filters.base.RUBY$method$multi_filter$0(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/logstash-core/lib/logstash/filters/base.rb:161)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:103)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:163)", "org.jruby.ir.targets.InvokeSite.fail(InvokeSite.java:187)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.logstash_minus_core.lib.logstash.filter_delegator.RUBY$method$multi_filter$0(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/logstash-core/lib/logstash/filter_delegator.rb:47)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:103)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:163)", "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:161)", "org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:314)", "org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)", "org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:132)", "org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath(MixedModeIRBlockBody.java:148)", "org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:73)", "org.jruby.runtime.Block.call(Block.java:124)", "org.jruby.RubyProc.call(RubyProc.java:289)", "org.jruby.internal.runtime.methods.ProcMethod.call(ProcMethod.java:63)", "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:204)", "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.logstash_minus_core.lib.logstash.pipeline.RUBY$method$filter_batch$0(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:445)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.logstash_minus_core.lib.logstash.pipeline.RUBY$method$worker_loop$0(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:424)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.logstash_minus_core.lib.logstash.pipeline.RUBY$method$worker_loop$0$__VARARGS__(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:77)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:93)", "org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)", "C_3a_.DevOps.Logstash.$6_dot_2_dot_4_dot_1.logstash_minus_6_dot_2_dot_4.logstash_minus_core.lib.logstash.pipeline.RUBY$block$start_workers$2(C:/DevOps/Logstash/6.2.4.1/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:386)", "org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:145)", "org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:71)", "org.jruby.runtime.Block.call(Block.java:124)", "org.jruby.RubyProc.call(RubyProc.java:289)", "org.jruby.RubyProc.call(RubyProc.java:246)", "org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:104)", "java.lang.Thread.run(Thread.java:748)"], :thread=>"#<Thread:0x4e22bca7 sleep>"}

So the piplineworkers die, the process continues to run and nothing gets processed.

This is different to the mapping errors I'd expect to see from ES. I believe the source data is the same but have had a very hard time to prove this as the exception returns no information as to what was in the pipeline at that time. I can only say since dropping messages like in my OP it's now running seemingly without issues.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.