Help! Logstash send to Elasticsearch use XML file or JSON file

hi everyone, can someone help me with this example? I have an XML file, I originally wanted to change it to JSON and use logstash to send it to elasticsearch, many times it didn't work so I decided to keep the XML and use XML filter but it also failed, I tried already tried many ways but only return "message" which is a very long XML or JSON string. here is my XML file

<?xml version="1.0" encoding="UTF-8"?>
<List name="log-buffer"
    xmlns="http://bacnet.org/csml/1.2">
    <Sequence name="778000">
        <DateTime name="timestamp" value="2014-06-03T20:12:36.80"/>
        <Choice name="logDatum">
            <Real name="real-value" value="23.2418"/>
        </Choice>
        <BitString name="statusFlags" value=""/>
    </Sequence>
    <Sequence name="778001">
        <DateTime name="timestamp" value="2014-06-03T20:13:36.99"/>
        <Choice name="logDatum">
            <Real name="real-value" value="23.1019"/>
        </Choice>
        <BitString name="statusFlags" value=""/>
    </Sequence>
    <Sequence name="778099">
        <DateTime name="timestamp" value="2014-06-03T21:52:05.88"/>
        <Choice name="logDatum">
            <Real name="real-value" value="25.4676"/>
        </Choice>
        <BitString name="statusFlags" value=""/>
    </Sequence>
    <Sequence name="778100">
        <DateTime name="timestamp" value="2014-06-03T21:53:06.08"/>
        <Choice name="logDatum">
            <Real name="real-value" value="25.4554"/>
        </Choice>
        <BitString name="statusFlags" value=""/>
    </Sequence>
</List>

and file json after convert

{
    "List": {
        "@name": "log-buffer",
        "@xmlns": "http://bacnet.org/csml/1.2",
        "Sequence": [
            {
                "@name": "778000",
                "DateTime": {
                    "@name": "timestamp",
                    "@value": "2014-06-03T20:12:36.80"
                },
                "Choice": {
                    "@name": "logDatum",
                    "Real": {
                        "@name": "real-value",
                        "@value": "23.2418"
                    }
                },
                "BitString": {
                    "@name": "statusFlags",
                    "@value": ""
                }
            },
            {
                "@name": "778001",
                "DateTime": {
                    "@name": "timestamp",
                    "@value": "2014-06-03T20:13:36.99"
                },
                "Choice": {
                    "@name": "logDatum",
                    "Real": {
                        "@name": "real-value",
                        "@value": "23.1019"
                    }
                },
                "BitString": {
                    "@name": "statusFlags",
                    "@value": ""
                }
            },
            {
                "@name": "778099",
                "DateTime": {
                    "@name": "timestamp",
                    "@value": "2014-06-03T21:52:05.88"
                },
                "Choice": {
                    "@name": "logDatum",
                    "Real": {
                        "@name": "real-value",
                        "@value": "25.4676"
                    }
                },
                "BitString": {
                    "@name": "statusFlags",
                    "@value": ""
                }
            },
            {
                "@name": "778100",
                "DateTime": {
                    "@name": "timestamp",
                    "@value": "2014-06-03T21:53:06.08"
                },
                "Choice": {
                    "@name": "logDatum",
                    "Real": {
                        "@name": "real-value",
                        "@value": "25.4554"
                    }
                },
                "BitString": {
                    "@name": "statusFlags",
                    "@value": ""
                }
            }
        ]
    }
}

image
Can someone help me? I'm a newbie

or return each piece individually

If you are reading pretty-printed XML using a file input then you can use a multiline codec. The kibana data you show suggests you are reading each line separately.

file {
    path => ...
    codec => multiline { 
        pattern => "^</" 
        negate => true 
        what => next 
        auto_flush_interval => 2
    }
}

That will buffer lines that do not start with </ until it finds a line that does. Then it will flush all of those lines as a single event. You can then parse it using

xml { source => "message" target => "theXML" force_array => false }


All my values are printed into one field, is that fine and is it true or false?

input {
  file {
    path => "/home/hoadd4/convert/test.xml" 
    start_position => "beginning" 
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "^<List/" 
      negate => true
      what => "previous"
      auto_flush_interval => 2
    }
  }
}

filter {
  xml {
    #remove_namespaces => "true"
    source => "message" 
    target => "List" 
    #store_xml => true 
    force_array => false
  }

  mutate {
    remove_field => ["message"]
  }

}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "test" 
  }
}
``` here is my logstash.conf

I suspect that those are arrays. In Kibana, on the Discover tab, you can click on the JSON tab for one of these events and it should be clearer what the structure is.

You might find it worthwhile to write enough ruby code to transform

            "BitString" => {
                "value" => "",
                 "name" => "statusFlags"
            },
             "DateTime" => {
                "value" => "2014-06-03T20:13:36.99",
                 "name" => "timestamp"
            },
               "Choice" => {
                "Real" => {
                    "value" => "23.1019",
                     "name" => "real-value"
                },
                "name" => "logDatum"
            },
                 "name" => "778001"
        },

into

{
"statusFlags" => ""
  "timestamp" => "2014-06-03T20:13:36.99"
   "logDatum" => 23.1019
       "name" => "778001"
}

but doing that will require a lot more detail about the definition of the format you are receiving.

is it

{
  "_index": "test",
  "_type": "_doc",
  "_id": "OFdmRYoB9j7M8YyR4_8g",
  "_version": 1,
  "_score": 1,
  "_source": {
    "List": {
      "name": "log-buffer-1",
      "xmlns": "http://bacnet.org/csml/1.2",
      "Sequence": [
        {
          "name": "778000",
          "Choice": [
            {
              "name": "logDatum",
              "Real": [
                {
                  "name": "real-value",
                  "value": "23.2418"
                }
              ]
            }
          ],
          "BitString": [
            {
              "name": "statusFlags",
              "value": ""
            }
          ],
          "DateTime": [
            {
              "name": "timestamp",
              "value": "2014-06-03T20:12:36.80"
            }
          ]
        },
        {
          "name": "778001",
          "Choice": [
            {
              "name": "logDatum",
              "Real": [
                {
                  "name": "real-value",
                  "value": "23.1019"
                }
              ]
            }
          ],
          "BitString": [
            {
              "name": "statusFlags",
              "value": ""
            }
          ],
          "DateTime": [
            {
              "name": "timestamp",
              "value": "2014-06-03T20:13:36.99"
            }
          ]
        },
        {
          "name": "778099",
          "Choice": [
            {
              "name": "logDatum",
              "Real": [
                {
                  "name": "real-value",
                  "value": "25.4676"
                }
              ]
            }
          ],
          "BitString": [
            {
              "name": "statusFlags",
              "value": ""
            }
          ],
          "DateTime": [
            {
              "name": "timestamp",
              "value": "2014-06-03T21:52:05.88"
            }
          ]
        },
        {
          "name": "778100",
          "Choice": [
            {
              "name": "logDatum",
              "Real": [
                {
                  "name": "real-value",
                  "value": "25.4554"
                }
              ]
            }
          ],
          "BitString": [
            {
              "name": "statusFlags",
              "value": ""
            }
          ],
          "DateTime": [
            {
              "name": "timestamp",
              "value": "2014-06-03T21:53:06.08"
            }
          ]
        }
      ]
    },
    "path": "/home/hoadd4/convert/test.xml",
    "@version": "1",
    "host": "localhost.localdomain",
    "@timestamp": "2023-08-30T07:44:44.005Z",
    "tags": [
      "multiline"
    ]
  },
  "fields": {
    "List.Sequence.Choice.Real.value.keyword": [
      "23.2418",
      "23.1019",
      "25.4676",
      "25.4554"
    ],
    "tags.keyword": [
      "multiline"
    ],
    "List.xmlns": [
      "http://bacnet.org/csml/1.2"
    ],
    "List.Sequence.DateTime.name": [
      "timestamp",
      "timestamp",
      "timestamp",
      "timestamp"
    ],
    "List.Sequence.Choice.name.keyword": [
      "logDatum",
      "logDatum",
      "logDatum",
      "logDatum"
    ],
    "List.Sequence.BitString.name": [
      "statusFlags",
      "statusFlags",
      "statusFlags",
      "statusFlags"
    ],
    "path": [
      "/home/hoadd4/convert/test.xml"
    ],
    "List.Sequence.BitString.value.keyword": [
      "",
      "",
      "",
      ""
    ],
    "List.Sequence.DateTime.name.keyword": [
      "timestamp",
      "timestamp",
      "timestamp",
      "timestamp"
    ],
    "host": [
      "localhost.localdomain"
    ],
    "List.Sequence.Choice.Real.name.keyword": [
      "real-value",
      "real-value",
      "real-value",
      "real-value"
    ],
    "@version": [
      "1"
    ],
    "host.keyword": [
      "localhost.localdomain"
    ],
    "List.Sequence.name.keyword": [
      "778000",
      "778001",
      "778099",
      "778100"
    ],
    "List.Sequence.Choice.name": [
      "logDatum",
      "logDatum",
      "logDatum",
      "logDatum"
    ],
    "@version.keyword": [
      "1"
    ],
    "List.Sequence.DateTime.value": [
      "2014-06-03T20:12:36.800Z",
      "2014-06-03T20:13:36.990Z",
      "2014-06-03T21:52:05.880Z",
      "2014-06-03T21:53:06.080Z"
    ],
    "List.xmlns.keyword": [
      "http://bacnet.org/csml/1.2"
    ],
    "tags": [
      "multiline"
    ],
    "List.Sequence.Choice.Real.name": [
      "real-value",
      "real-value",
      "real-value",
      "real-value"
    ],
    "@timestamp": [
      "2023-08-30T07:44:44.005Z"
    ],
    "List.Sequence.name": [
      "778000",
      "778001",
      "778099",
      "778100"
    ],
    "List.Sequence.BitString.value": [
      "",
      "",
      "",
      ""
    ],
    "List.Sequence.Choice.Real.value": [
      "23.2418",
      "23.1019",
      "25.4676",
      "25.4554"
    ],
    "List.name.keyword": [
      "log-buffer-1"
    ],
    "List.name": [
      "log-buffer-1"
    ],
    "path.keyword": [
      "/home/hoadd4/convert/test.xml"
    ],
    "List.Sequence.BitString.name.keyword": [
      "statusFlags",
      "statusFlags",
      "statusFlags",
      "statusFlags"
    ]
  }
}

Is there a way to separate it, please help me?

That looks like output from elasticsearch and I am not sure what it means (I do not run elasticsearch), but it certainly looks like the field is an array.

Yes, it bundles all the fields with .value in each item together regardless of whether they have different names, but I want elasticsearch to print something like _source, each sequcence.name separately.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.