I am on ELK 7.13.0 with the Hadoop 7.13.0 library.
I have the following schema:
root
|-- @timestamp: timestamp (nullable = true)
|-- AuthenticationPackage: string (nullable = true)
|-- Destination: string (nullable = true)
|-- DomainName: string (nullable = true)
|-- EventID: long (nullable = true)
|-- FailureReason: string (nullable = true)
|-- LogHost: string (nullable = true)
|-- LogonID: string (nullable = true)
|-- LogonType: long (nullable = true)
|-- LogonTypeDescription: string (nullable = true)
|-- ParentProcessID: string (nullable = true)
|-- ParentProcessName: string (nullable = true)
|-- ProcessID: string (nullable = true)
|-- ProcessName: string (nullable = true)
|-- ServiceName: string (nullable = true)
|-- Source: string (nullable = true)
|-- Status: string (nullable = true)
|-- SubjectDomainName: string (nullable = true)
|-- SubjectLogonID: string (nullable = true)
|-- SubjectUserName: string (nullable = true)
|-- Time: long (nullable = true)
|-- UserName: string (nullable = true)
When I try a simple filter operation:
ss = (
SparkSession.builder.master('local[24]').appName("ES")
.config("spark.driver.memory", "8g")
.getOrCreate()
)
es_reader = (ss.read
.format("org.elasticsearch.spark.sql").option("inferSchema", "false")
.option("es.read.field.as.array.include", "tags").option("es.nodes","elasticsearch:9200")
.option("es.net.http.auth.user","elastic").option("es.net.http.auth.pass","123")
.option("es.net.ssl","true").option("es.net.ssl.cert.allow.self.signed","true"))
small_df = es_reader.load("priam_unified_host-{0}/unified-host".format(date))
small.filter(small_df.EventID == 4688).explain(extended=True)
I am getting the following error as if EventID is not existing but all documents in that index have in fact EventID field populated.
AnalysisException: Resolved attribute(s) EventID#559L missing from @timestamp#203,AuthenticationPackage#204,Destination#205,DomainName#206,EventID#207L,FailureReason#208,LogHost#209,LogonID#210,LogonType#211L,LogonTypeDescription#212,ParentProcessID#213,ParentProcessName#214,ProcessID#215,ProcessName#216,ServiceName#217,Source#218,Status#219,SubjectDomainName#220,SubjectLogonID#221,SubjectUserName#222,Time#223L,UserName#224 in operator !Filter (EventID#559L = cast(4688 as bigint)). Attribute(s) with the same name appear in the operation: EventID. Please check if the right attribute(s) are used.;
!Filter (EventID#559L = cast(4688 as bigint))
Any idea why this is happening?