How do I process an array with enrich processors

yhseo · November 9, 2020, 7:03am

The current work is to look at the tags of the original text whenever the original text comes in, and when a matching word comes in, we are trying to apply the classification system of the above source index. I am trying to construct a pipeline using the enrich processor.

source index sample

        POST meta_division/_doc
        {
        "division": "product",
        "level1": "amazon",
        "level2": "beverage",
        "level3": "carbonated drink",
        "equal1": null,
        "equal2": "Refreshing",
        "equal3": ["cola", "sidar", "Soda Water", "Lemonade", "Citron", "Soft Drink", "Carbonic Acid"],
        "equal4": ["Coke", "coke", "coca cola", "cocacola"],
        "equal5": ["Coca-Cola", "Coke Zero", "Coclight", "Coffee Coca-Cola"]
        }
        ...
        POST meta_division/_doc
        {
        "division": "product",
        "level1": "amazon",
        "level2": "Beauty",
        "level3": "Lotion",
        "level4": null,
        "level5": null,
        "equal2": ["Beauty", "Beautiful"],
        "equal3": ["Emulsion", "Emulsion", "Moisturizer", "Moisturizer", "Fluid", "Booster", "Emulsion", "Moisturizer"],
        "equal4": null,
        "equal5": null
        }

The source index is prepared in the form as above, and when data is entered in the form below, if it matches the source list in the TAG list, we are trying to insert the corresponding field (that is, by matching the source index for each incoming document, enrich I am trying to do it.)

There are some unfavorable things...

    POST /_ingest/pipeline/lookup/_simulate
    {
    "docs": [
    {
    "_index": "market_test_new",
        "_id": "id",
        "_source": {
        "TAGS": ["League", "Beverage", "Cola", "Soda", "Coca-Cola", "Lotion", "Soft Drink", "Page", "Scale", "Monument", "Esports", "Twitter"],
        ....
        }
        }
        ]
        }

First, let me ask you some questions.

When "equal3" is imported through enrich, all arrays of equal3 field are imported as below. Could this arrangement be split and only matching words can be pasted into the document?

"equal3": ["cola", "sidar", "Soda Water", "Lemonade", "Citron", "Soft Drink", "Carbonic Acid"],

Desired output

"equal3": ["cola", "Soft Drink"]

When looking at level1 through enrich and matching, it was said that "amazon" was matched in the tag in the above result

The source index "level1": "amazon" seems to cause duplication.

Result.

    "category" : [
    {
    "level1" : "amazon",
    "division" : "product"
    },
    {
    "level1" : "amazon",
    "division" : "product"
    },
    ...
    ]

Among the variables used in the script, can I check the variable that can look at the source index or the matching value that I have in the variable?

To remove the duplicates, it may already have a value or you will need to check the value.

As a result, should every value of the source index be uniquely assigned?

The source index has not yet found an efficient structure. Any structure can be changed as long as you can compare the original text and apply meta classification.

Please check. Thank you.

system · December 7, 2020, 7:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Usually data are not enriched Elasticsearch	11	589	December 30, 2022
Enrich processor: enrich multiple documents into one array Elasticsearch	2	1064	May 1, 2020
Elastic Ingest Pipeline with enrich processor to enrich nested objects Elasticsearch	3	1995	November 23, 2020
Enrich Processor - If the fields do not match Elasticsearch	2	496	October 15, 2020
Using two enrich policies in one ingestion pipeline Elasticsearch	9	701	April 13, 2020

How do I process an array with enrich processors

Related topics