Revisiting "lookup table" concept

erik_n · May 2, 2022, 11:45pm

I've read all of the previous topics about this and done some searching, but I can't figure it out, so I'll ask specifically about my situation.
I have one cluster that is explicitly for reporting and visualizations. I'd like to use data from a connected remote cluster for a visualization, and that's easy enough.
But the data in the remote cluster cannot be denormalized, and we cannot re-index it, so the visualizations are working with IDs instead of names.
I created another index on the local cluster where each document's ID matches the IDs from the remote cluster's documents, with a name field.
Is there any way to script a dynamic field on the local cluster's index pattern so that visualizations can use the the names? I've looked through the painless documentation but could not find anything. Hopefully I'm just missing something, because I could unlock a lot of value with this simple ID lookup.
We are at v7.16.3

warkolm · May 3, 2022, 12:19am

What about Runtime fields | Elasticsearch Guide [7.16] | Elastic?

erik_n · May 3, 2022, 5:49pm

A runtime field would be set up on the mapping of the index in the remote cluster, right?
I would be able to populate the lookup index on that cluster as well, but I haven't been any to find any example of either a runtime or dynamic field to get the value from another index.
I do have the ID of the document in the other index, so I'm hoping it should be something simple like
emit(['/namelookup-example/_doc/exampledocid123'].doc['Name'].value)
or something!

erik_n · May 3, 2022, 10:59pm

I realized what i probably want to do is store a parameterized painless script that will take an ID and return a Name, and use that script for the scripted field. And now I've run into another problem!

when i run the base example in the docs, modifed for my index, of

POST /_scripts/painless/_execute
{
  "script": {
    "source": """
      emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT));
    """
  },
  "context": "keyword_field",
  "context_setup": {
    "index": "namelookup-corp",
    "document": {
      "@timestamp": "2020-04-30T14:31:43-05:00"
    }
  }
}

i get

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_access_error",
        "reason" : "illegal_access_error: failed to access class org.objectweb.asm.Edge from class org.objectweb.asm.MethodWriter (org.objectweb.asm.Edge is in unnamed module of loader 'app'; org.objectweb.asm.MethodWriter is in unnamed module of loader java.net.FactoryURLClassLoader @4d8539de)"
      }
    ],
    "type" : "script_exception",
    "reason" : "compile error",
    "script_stack" : [ ],
    "script" : "\n      emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT));\n    ",
    "lang" : "painless",
    "caused_by" : {
      "type" : "illegal_access_error",
      "reason" : "illegal_access_error: failed to access class org.objectweb.asm.Edge from class org.objectweb.asm.MethodWriter (org.objectweb.asm.Edge is in unnamed module of loader 'app'; org.objectweb.asm.MethodWriter is in unnamed module of loader java.net.FactoryURLClassLoader @4d8539de)"
    }
  },
  "status" : 400
}

We are aware of Announcement 2021-001 and have already masked log4j-cve-2021-44228-hotpatch and restarted all nodes in the cluster. I confirmed on an individual node that the service was masked and not running, restarted the node again, and run the script against localhost with curl, but still get the same error.
I've even been able to run more basic painless emits as dynamic fields!

This is not really kibana specific any more and perhaps I should start a new topic, but this has me baffled.

erik_n · May 3, 2022, 11:50pm

And now that i've spent more time digging, I'm not sure how to do this, since scripts seem to always be executed in the context of a given index and document.

(I've got years of experience with a bunch of elastic things, but obviously scripting is not one of those things!)

warkolm · May 3, 2022, 11:58pm

I don't believe you can run scripts where you pull data from one index into another, it's not something I have ever seen being done and I can think of a few good reasons why we wouldn't allow it anyway.

Can you pull the data from the remote cluster in (eg CCS or CCS) and run a transform job or something similar, to create a new index with the data you want?

erik_n · May 4, 2022, 12:09am

In some use cases for this, yes I can. In others I have billions of docs in hundreds of terabytes of storage so it isn't feasible, and I was REALLY hoping I could do this with the annotate-on-read concept.
If I understand correctly, it seems like I wouldn't be able to do an update with script on this either, and would have to reindex to accomplish this. Does that sound right?

warkolm · May 4, 2022, 12:10am

Yeah I think that's where you are needing to head unfortunately.

erik_n · May 4, 2022, 12:13am

that is unfortunate, since names can change and that would make historical reporting weird. kibana may not be the one stop shop I was hoping for here.
regardless, thank you for your time!
I am still very curious what is happening with the illegal_access_error, but it doesn't seem like it's blocking me.

system · June 1, 2022, 12:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Painless script not work Elasticsearch painless , runtime-fields	1	683	February 8, 2022
Adding runtime fields referencing hash map elements to data view Kibana painless	1	209	September 28, 2023
Use the index name in a scripted field Kibana painless	3	690	March 6, 2021
Split Function in Painless Kibana	2	1360	November 20, 2019
Using Painless Scripted Field Kibana	6	1751	December 21, 2017

Revisiting "lookup table" concept

Related topics