How to look up data in elasticseach as part of an ingest node process


(Ant) #1

Hi,

to start my level of knowledge with this kind of thing is very entry level. The over all task is that there is data in Apache Hive being pushed to an elastic index (that bit is working already) and I need to enrich it as it contains an identifier field but I need to add an extra field to every document that contains a more human friendly name based on the id number passed. This data all lives in an index on the cluster.

Origionally I was told to use logstash to enrich the data but while setting about this task I foud there is an ingest node option where by you update the data before it even gets indexed which seems way more effecient.

What I'm struggling with is how to take the ID field of the incoming request, query out the human friendly name and then insert that into the document before it is indexed.

I know the query, if I pick a single record I can use the dev tools in kibana I can write a query which returns the single document I need to allow me to do the enrichment

get  entity_data/_search
{
  "query": {
    "match": {
      "entityId": 100,
      }
    }
  }

I'm just struggling to find online how I actually use that in the pipeline to allow me to inject the entity_name into the document.

If anyone can give me any guidance or point me to the resource I need to read through it would be much appreceated.

Kind regards
Ant


(Magnus Kessler) #2

This type of enrichment can be achieved with the Logstash translate filter. There is no equivalent in Elasticsearch ingest processors.

Using the translate filter with a dictionary file would be more performant than looking up values at runtime from either Elasticsearch or some other database. The dictionary file can be updated periodically, and Logstash will pick up changes without a restart.


(Ant) #3

Hi @Magnus_Kessler ,

Thanks for the reply, atleast I know why I've been struggling to find it now. So given I want to also apply roll overs on the index as it will get quite large, would I be best to write the data to a holding index and then use logstash to enrich and move it to the permenant index or get logstash to enrich in place?

My thinking is that if logstash is enriching the active index then I might have an issue where it rolls over before the entries for the previous index have been enriched.

Kind regards
Ant


(system) closed #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.