Convert text field containg xml string

Hi,

I am pretty new to elasticsearch and I have a problem that I can't even solve using google. Therefore, I really appreciate any help.

I use a tool (log2timeline) to create an Index and add data to it. I am not able to alter this process. Log2timeline stores a xml structure in a field of type text called xml_string - here is the relevant part of the mapping:

"xml_string": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }

I would like to search each XML tag like a elasticsearch field, because I use Kibana to analyze the data.
Therefore, I would like to just add an field-object to the index which contains the parsed xml object. I have figured out that I can alter the index by using _update_by_query. But I don't know how to convert the string to JSON automatically.

I found the logstash filter plugin xml, but to my understanding this can't be used in elasticsearch queries.

Is there any way to perform this task directly in elasticsearch?

For clarification purposes, I have added an example of the xml_string content:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}"/>
    <EventID>4624</EventID>
    <Version>1</Version>
    <Level>0</Level>
    <Task>12544</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2015-12-12T03:30:35.230632300Z"/>
    <EventRecordID>331</EventRecordID>
    <Correlation/>
    <Execution ProcessID="500" ThreadID="520"/>
    <Channel>Security</Channel>
    <Computer>sensei</Computer>
    <Security/>
  </System>
  <EventData>
    <Data Name="SubjectUserSid">S-1-5-18</Data>
    <Data Name="SubjectUserName">SENSEI$</Data>
    <Data Name="SubjectDomainName">WORKGROUP</Data>
    <Data Name="SubjectLogonId">0x00000000000003e7</Data>
    <Data Name="TargetUserSid">S-1-5-19</Data>
    <Data Name="TargetUserName">LOCAL SERVICE</Data>
    <Data Name="TargetDomainName">NT AUTHORITY</Data>
    <Data Name="TargetLogonId">0x00000000000003e5</Data>
    <Data Name="LogonType">5</Data>
    <Data Name="LogonProcessName">Advapi  </Data>
    <Data Name="AuthenticationPackageName">Negotiate</Data>
    <Data Name="WorkstationName"/>
    <Data Name="LogonGuid">{00000000-0000-0000-0000-000000000000}</Data>
    <Data Name="TransmittedServices">-</Data>
    <Data Name="LmPackageName">-</Data>
    <Data Name="KeyLength">0</Data>
    <Data Name="ProcessId">0x00000000000001ec</Data>
    <Data Name="ProcessName">C:\Windows\System32\services.exe</Data>
    <Data Name="IpAddress">-</Data>
    <Data Name="IpPort">-</Data>
    <Data Name="ImpersonationLevel">%%1833</Data>
  </EventData>
</Event>

Thanks for your help!

Dennis

I'm wondering if this community plugin could help you.

FWIW you need at the end to create a JSON before indexing the document if you want to be able to search for specific field names.

FSCrawler also supports xml parsing. See https://fscrawler.readthedocs.io/en/latest/admin/fs/local-fs.html#indexing-xml-docs

Hi David,

thank you for the reply. I have looked into both suggestions.

  1. elasticsearch-ingest-xml

As far as I understood, I would add this plugin to the elasticsearch ingestion pipeline.

Unfortunately, the gradle build fails (maybe it is too old):


2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] FAILURE: Build failed with an exception.

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * Where:

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] Build file '/root/elasticsearch-ingest-xml/build.gradle' line: 17

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * What went wrong:

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] A problem occurred evaluating root project 'ingest-xml'.

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] > Failed to apply plugin [id 'carrotsearch.randomized-testing']

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] > Removing tasks from the task container is not supported. Disable the tasks or use replace() instead.

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * Try:

2020-06-27T07:28:42.367+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] Run with --stacktrace option to get the stack trace. Run with --scan to get full insights.

2020-06-27T07:28:42.368+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]

2020-06-27T07:28:42.368+0000 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * Get more help at https://help.gradle.org

2020-06-27T07:28:42.368+0000 [ERROR] [org.gradle.internal.buildevents.BuildResultLogger]

2020-06-27T07:28:42.368+0000 [ERROR] [org.gradle.internal.buildevents.BuildResultLogger] BUILD FAILED in 2s

  1. FSCrawler also supports xml parsing

Could you elaborate on how to activate this for the field xml_string?

Thanks for the support! It is really appreciated.

Dennis

1/ are you trying to build the plugin yourself? But yeah you need a build for the specific version. May be ask the author?

2/ FSCrawler reads files on disk. Not fields or anything like that.

Hi David,

thanks for the quick reply.

  1. Yes, I tried to build it using gradle and the build instructions. Is there any repositories with prebuild versions? I am running version 7.8.0

  2. Okay, then this will not work. Because this is just an xml string.

Just a general idea: I have taken a closer look at the program, that does the upload to elasticsearch. If I change that the field xml_string is not populated with an xml structure, but instead with a JSON structure does elasticsearch automatically detects this or do I need to adjust the mapping as well?

Thanks!

Dennis

No. You need to create a proper json document. The mapping won't help here IMHO.