org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for field' not found in row

Hi there,
I have seen this issue before and it seems stills to apply.
I am using this version via pyspark:

ss = (
SparkSession.builder.appName("ES")
.config("spark.driver.memory", "8g")
.config(
"spark.jars.packages",
"org.elasticsearch:elasticsearch-spark-30_2.12:7.13.0"
)
.getOrCreate()
)

Print Schema works:

root
|-- @timestamp: timestamp (nullable = true)
|-- AuthenticationPackage: string (nullable = true)
|-- Destination: string (nullable = true)
|-- DomainName: string (nullable = true)
|-- EventID: long (nullable = true)
|-- FailureReason: string (nullable = true)
|-- LogHost: string (nullable = true)
|-- LogonID: string (nullable = true)
|-- LogonType: long (nullable = true)
|-- LogonTypeDescription: string (nullable = true)
|-- ParentProcessID: string (nullable = true)
|-- ParentProcessName: string (nullable = true)
|-- ProcessID: string (nullable = true)
|-- ProcessName: string (nullable = true)
|-- ServiceName: string (nullable = true)
|-- Source: string (nullable = true)
|-- Status: string (nullable = true)
|-- SubjectDomainName: string (nullable = true)
|-- SubjectLogonID: string (nullable = true)
|-- SubjectUserName: string (nullable = true)
|-- Time: long (nullable = true)
|-- UserName: string (nullable = true)
|-- event: struct (nullable = true)
| |-- category: string (nullable = true)
|-- host: struct (nullable = true)
| |-- name: string (nullable = true)
|-- message: string (nullable = true)
|-- user: struct (nullable = true)
| |-- name: string (nullable = true)

However any operation like show triggers the positition not found error.

The mappings are correct:

{'mappings': {'unified-host': {'properties': {'@timestamp': {'type': 'date'},
'AuthenticationPackage': {'type': 'keyword'},
'Destination': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'DomainName': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'EventID': {'type': 'long'},
'FailureReason': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'LogHost': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'LogonID': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'LogonType': {'type': 'long'},
'LogonTypeDescription': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'ParentProcessID': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'ParentProcessName': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'ProcessID': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'ProcessName': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'ServiceName': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'Source': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'Status': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'SubjectDomainName': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'SubjectLogonID': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'SubjectUserName': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'Time': {'type': 'long'},
'UserName': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'event': {'properties': {'category': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}},
'host': {'properties': {'name': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}},
'message': {'type': 'keyword'},
'user': {'properties': {'name': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}}}}}
Blockquote

Is this still related to the dot notation?
How can I workaround it? Is there a way to exclude specific fields?

By the way the only solution for me was to disable the ECS dot notation, which of course introduces other issues when doing correlation but at least I am able to perform queries via hadoop es.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.