How to store shingle tokens to data fields of a document?

I've been searching for a way to append tokens created by the shingle token filter to field data as it's being parsed into Logstash. All the documentation I've found so far mainly deals with how to use the shingle token filter to use the REST API to analyze text for data that's already been logged and indexed, but I haven't been able to find ways to append said text to data field values as it's being indexed.

In a nutshell, what I'm trying to acccomplish is the following: suppose I have a message
STORE THE SHINGLES.
Applying a shingle filter with max and min shingle size of 2, we get the tokens:
STORE THE, and THE SHINGLES.
I would like to find a way to parse the message into Logstash such that it is indexed like so:

{
    ...
    "message": "STORE THE SHINGLES",
    "shingle_2": ["STORE THE", "THE SHINGLES"],
    ...
}

If this is not possible, then I was hoping if there was a way to append such shingled data into data that has already been logged and indexed, as in turn this:

{
    ...
    "message": "STORE THE SHINGLES",
    ...
}

...into this:

{
    ...
    "message": "STORE THE SHINGLES",
    "shingle_2": ["STORE THE", "THE SHINGLES"],
    ...
}

Thanks in advance for any help or advice!

Given this seems to be focussed around processing in Logstash I have moved the thread.

For the time being, I am considering just calling the shingle filter through REST curls, and extracting the needed tokens from the JSONs and storing them to a local file with a custom shell script (I have about 200,000 lines text segments to run through, which I think is a manageable size). After which I could use the translate filter to match messages to their corresponding shingle tokens.

EDIT: Solved using the above method!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.