Setting an ingest pipeline in Machine learning

Hi all,

Hoping this is an easy question!

I have created an enrichment policy that adds an additional field to documents based on whether the geo-point within the document is within a geo-shape from the source index. This all works fine when testing a few documents in the Console. The example I followed can be found here - https://www.elastic.co/guide/en/elasticsearch/reference/master/geo-match-enrich-policy-type.html

I now want to ingest a larger number of documents using machine learning and importing a CSV. I have imported the CSV file, set the index name and set the correct mappings. However, when it comes to the Ingest Pipeline field, I am at a loss as to what I need to put in there.

It's evident that it needs some sort of properly formatted query language, but what exactly I do not know. Apologies, I am just starting my ELK journey, and I'm not very familiar with the query language used!

Any help greatly appreciated :slight_smile: Thanks!

Hey @benji87,

First of all, great name!

I am guessing you are referring to the CSV upload feature in the data visualization section of ML.

In the pipeline section, you can simply add your enrichment processor as the last processor.

"processors": [ // Might already be present in the pipeline section
... //processors defined automatically by ml (if present)    
{// Your enrich processor
      "enrich": {
        "policy_name": "postal_policy",
        "field": "geo_location",
        "target_field": "geo_data",
        "shape_relation": "INTERSECTS"
      }
    }
  ]

I am curious, how were you testing via the kibana console? I am guessing you created your pipeline and used _simulate. If you did, then you should be able to copy paste your processor definition into the pipeline in the CSV Uploader.

Thanks!

Ben

Hi @BenTrent,

Thanks! Yours aint bad either :smile:

I am guessing you are referring to the CSV upload feature in the data visualization section of ML.

Yes that's right, here's a screenshot of the menu and what I have configured (mapping blanked intentionally) -

I've added what you suggested, however I'm getting the following error on import -

Not quite sure which part it's not happy with! Do you have any ideas? Thanks again :slight_smile:

Oh, also in answer to this question -

I am curious, how were you testing via the kibana console? I am guessing you created your pipeline and used _simulate . If you did, then you should be able to copy paste your processor definition into the pipeline in the CSV Uploader.

I just followed the instructions in the example I posted earlier, which involved using PUT commands to create the geo-shap index, enrichment policy, and then index a document and specifiy the enrichment policy as the ingest pipeline. :slight_smile:

I THINK you need to wrap the pipeline definition in {}

Example:

{
"processors": [
{
"enrich": {
"policy_name": "bens_policy",
"field": "home_gps",
"target_field": "geo_data",
"shape_relation": "INTERSECTS"
}
}
]
}

Unfortunately not, I get the same error! Only this time it's "position 108" in the JSON error.

I feel like we're pretty close to nailing it, there must be something not quite right though :thinking:

@benji87 interesting.

What version of the Elasticsearch and Kibana are you utilizing? I cannot reproduce the issue in 7.9.2.

My example pipeline definition that worked

{
  "description": "My simple ingestion pipeline",
  "processors": [
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

Hi @BenTrent

So it turns out after all that, it was a rogue quotation mark from copying and pasting into the ingest window -

Once this was fixed, everything worked a treat!

When using the console, this sort of error is flagged. I wonder whether it's worthwhile extending this sort of error detection to other places, like the ingest pipeline window as an example.

Thanks again for all your help!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.