Array search from Logstash elasticsearch plugin

I'm bringing records from a SQL Server database into Elasticsearch using Logstash, and I need to convert a numerical field that could apply several times to each SQL Server record into one array field in the Elasticsearch record. Each of the records that become ES documents is associated with one or more other records (I'll call those obj_1), which in turn are associated with the tag fields. Something like this:

SQL Server rows:

id       obj_1    obj_1_tag_field
1        A        1
1        B        2
1        B        3
2        A        1
2        C        4

Desired ES documents:

{
    "id": 1,
    "obj_1": [
        "A",
        "B"
    ],
    "tag_field": [
        1,
        2,
        3
    ]
},
{
    "id": 2,
    "obj_1": [
        "A",
        "C"
    ],
    "tag_field": [
        1,
        4
    ]
}

Ordinarily, I'd do a join in SQL Server and use a Logstash aggregate filter to combine the SQL Server rows. But in this case, I'm already aggregating some other fields, and because of the nesting and aggregating, the number of rows for each id tends to explode, so pulling 10,000 ids creates something like 4 million rows, which quickly overwhelms SQL.

However, obj_1 already exists in ES with its associated tags and rarely changes. A, B and C above would look something like this:

{
    "id": "A",
    "tag_field": [
        1
    ]
},
{
    "id": "B",
    "tag_field": [
        2,
        3
    ]
},
{
    "id": "C",
    "tag_field": [
        4
    ]
}

So I'd like to use the obj_1 links to query ES with the logstash-elasticsearch-filter plugin and use its tag_field array to update the Logstash event record. The only issue here is that the obj_1 field for the logstash event is an array. I know how to use the plugin to query for a single Elasticsearch document using an id or something, but I've had trouble doing it with an array.

With all that said, here's my main question: is there a way to use the logstash-elasticsearch-filter plugin to query by matching an array from the event to ids in Elasticsearch? Or can it be done with a Ruby filter and a loop somehow?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.