How do I process an array with enrich processors

The current work is to look at the tags of the original text whenever the original text comes in, and when a matching word comes in, we are trying to apply the classification system of the above source index. I am trying to construct a pipeline using the enrich processor.

  1. source index sample
        POST meta_division/_doc
        {
        "division": "product",
        "level1": "amazon",
        "level2": "beverage",
        "level3": "carbonated drink",
        "equal1": null,
        "equal2": "Refreshing",
        "equal3": ["cola", "sidar", "Soda Water", "Lemonade", "Citron", "Soft Drink", "Carbonic Acid"],
        "equal4": ["Coke", "coke", "coca cola", "cocacola"],
        "equal5": ["Coca-Cola", "Coke Zero", "Coclight", "Coffee Coca-Cola"]
        }
        ...
        POST meta_division/_doc
        {
        "division": "product",
        "level1": "amazon",
        "level2": "Beauty",
        "level3": "Lotion",
        "level4": null,
        "level5": null,
        "equal2": ["Beauty", "Beautiful"],
        "equal3": ["Emulsion", "Emulsion", "Moisturizer", "Moisturizer", "Fluid", "Booster", "Emulsion", "Moisturizer"],
        "equal4": null,
        "equal5": null
        }

The source index is prepared in the form as above, and when data is entered in the form below, if it matches the source list in the TAG list, we are trying to insert the corresponding field (that is, by matching the source index for each incoming document, enrich I am trying to do it.)

There are some unfavorable things...

    POST /_ingest/pipeline/lookup/_simulate
    {
    "docs": [
    {
    "_index": "market_test_new",
        "_id": "id",
        "_source": {
        "TAGS": ["League", "Beverage", "Cola", "Soda", "Coca-Cola", "Lotion", "Soft Drink", "Page", "Scale", "Monument", "Esports", "Twitter"],
        ....
        }
        }
        ]
        }

First, let me ask you some questions.

  1. When "equal3" is imported through enrich, all arrays of equal3 field are imported as below. Could this arrangement be split and only matching words can be pasted into the document?

"equal3": ["cola", "sidar", "Soda Water", "Lemonade", "Citron", "Soft Drink", "Carbonic Acid"],

Desired output

"equal3": ["cola", "Soft Drink"]

  1. When looking at level1 through enrich and matching, it was said that "amazon" was matched in the tag in the above result

The source index "level1": "amazon" seems to cause duplication.

Result.

    "category" : [
    {
    "level1" : "amazon",
    "division" : "product"
    },
    {
    "level1" : "amazon",
    "division" : "product"
    },
    ...
    ]
  1. Among the variables used in the script, can I check the variable that can look at the source index or the matching value that I have in the variable?

To remove the duplicates, it may already have a value or you will need to check the value.

As a result, should every value of the source index be uniquely assigned?

The source index has not yet found an efficient structure. Any structure can be changed as long as you can compare the original text and apply meta classification.

Please check. Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.