Resolved attribute is missing ... but is not!

I am on ELK 7.13.0 with the Hadoop 7.13.0 library.
I have the following schema:

root
|-- @timestamp: timestamp (nullable = true)
|-- AuthenticationPackage: string (nullable = true)
|-- Destination: string (nullable = true)
|-- DomainName: string (nullable = true)
|-- EventID: long (nullable = true)
|-- FailureReason: string (nullable = true)
|-- LogHost: string (nullable = true)
|-- LogonID: string (nullable = true)
|-- LogonType: long (nullable = true)
|-- LogonTypeDescription: string (nullable = true)
|-- ParentProcessID: string (nullable = true)
|-- ParentProcessName: string (nullable = true)
|-- ProcessID: string (nullable = true)
|-- ProcessName: string (nullable = true)
|-- ServiceName: string (nullable = true)
|-- Source: string (nullable = true)
|-- Status: string (nullable = true)
|-- SubjectDomainName: string (nullable = true)
|-- SubjectLogonID: string (nullable = true)
|-- SubjectUserName: string (nullable = true)
|-- Time: long (nullable = true)
|-- UserName: string (nullable = true)

When I try a simple filter operation:

ss = (
    SparkSession.builder.master('local[24]').appName("ES")
    .config("spark.driver.memory", "8g")
    .getOrCreate()
)

es_reader = (ss.read
    .format("org.elasticsearch.spark.sql").option("inferSchema", "false")
    .option("es.read.field.as.array.include", "tags").option("es.nodes","elasticsearch:9200")
    .option("es.net.http.auth.user","elastic").option("es.net.http.auth.pass","123")
             .option("es.net.ssl","true").option("es.net.ssl.cert.allow.self.signed","true"))

small_df = es_reader.load("priam_unified_host-{0}/unified-host".format(date))
small.filter(small_df.EventID == 4688).explain(extended=True)

I am getting the following error as if EventID is not existing but all documents in that index have in fact EventID field populated.

AnalysisException: Resolved attribute(s) EventID#559L missing from @timestamp#203,AuthenticationPackage#204,Destination#205,DomainName#206,EventID#207L,FailureReason#208,LogHost#209,LogonID#210,LogonType#211L,LogonTypeDescription#212,ParentProcessID#213,ParentProcessName#214,ProcessID#215,ProcessName#216,ServiceName#217,Source#218,Status#219,SubjectDomainName#220,SubjectLogonID#221,SubjectUserName#222,Time#223L,UserName#224 in operator !Filter (EventID#559L = cast(4688 as bigint)). Attribute(s) with the same name appear in the operation: EventID. Please check if the right attribute(s) are used.;
!Filter (EventID#559L = cast(4688 as bigint))

Any idea why this is happening?

Interesting it seems that it doesn't like that syntax, this works:

small_df.where('EventID == 4688').limit(10).explain(extended=True)

Maybe worth adding in the documentation examples.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.