Xml filter for dynamic xml


(Peter Skarmyr) #1

Hi

I am still new to elasticsearch and are having som trouble importing xml-data and structuring this in a way that allows me to do meaningful aggeregations in kibana on the data from the xml.
I am importing data from an sql-database. One of the columns in the table contains dynamic xml similar to this:

<collection>
    <row>
        <field name="Price" type="number" value="1549" />
        <field name="Marked" type="text" value="Completed" />
        <field name="TransactionDatetime" type="datetime" value="2018-11-07 20:00:00Z" />
        <field name="Tag" type="text" value="A-Commercial" />
    </row>
</collection>

The number of fields and the attributes "name" and "type" will be static for "groups" within the index. Eg. I have my index "myindex" containing 100 000 documents containing a field "group" which contains a number between 1 and 100. All documents with group=1 will have the same number of fields with the same "name" and "type", but a different "value" attribute for each document. Same goes for groups 2-100.

In logstash I have used the xml-filter to import this data:

xml {
    force_array => false
    source => "data"
    target => "xmlitemdata"
}

This imports the data and creates a field xmlitemdata which looks like this in kibana:

"xmlitemdata": {
  "row": {
    "field": [
      {
        "name": "Price",
        "value": "1549",
        "type": "number"
      },
      {
        "name": "Marked",
        "value": "Completed",
        "type": "text"
      },
      {
        "name": "TransactionDatetime",
        "value": "2018-11-07 20:00:00Z",
        "type": "datetime"
      },
      {
        "name": "Tag",
        "value": "A-Commercial",
        "type": "text"
      }
    ]
  }
}

From this data I would like to make visualisations. Eg. A bar diagram where you count the number of documents in different price ranges. This does not seem to be possible because you can only make aggregations on fields "name" and "value" seperately. So I can make an aggregation based on the field value and I am not sure how to filter out values belonging to a different name like Tag. Another issue is that the datatype is always text and not number. I have considered a few possible solutions, but I am unsure of how to implement this. That is why I have made this forum post and put it in the logstash section, because I think the best solution is to change the way data is importet in logstash.

Option 1 - Restructuring how the data is stored in document field by applying filters in logstash
From my short experience with kibana I think I could easly achieve the visualisations I want by strucuting the data in the field xmlitemdata in a way that allows me to perform aggregations on the correct values from the xml. One possible structure that allows this is somthing similar to this:

    "xmlitemdata": {
    "row": {
      "field": [
        {
          "Price": 1549
        },
        {
          "Marked": "Completed"
        },
        {
          "TransactionDatetime": "2018-11-07T20:00:00.000Z",
        },
        {
          "Tag": "A-Commercial",
        }
      ]
    }
  }

Could anyone point me in the right direction on how to achieve this? How can I make sure that the data also gets the correct datatype like number and datetime instead of string?

Option 2 - Splitting the data into more indecies and adding fields directly to document
Another option is to split the my index into one index per group and add the fields like Price, Marked, TransactionDatetime and Tag as fields in the document. I think that this is probably not a good idea because of the following reasons:

  • All groups have about 30 other fields not related to the xml which have the same names and datatypes across all groups. Therefore it makes sense to have one index for all groups I think.
  • Some groups are small so you would have indecies not containing much data.
  • The number of groups might increase over time.
  • Many of our visualisations would use a index pattern in kibana that gets data from all the groups' indecies. This is probably not great for performance.

Since I am new to elastic I am not sure how much of a impact all of these "problems" have. So if you think this option is better, how could this be achieved using logstash? In particular how can you insert fields in the document with names from the xml and how can you ensure these have the correct datatype?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.