I am attempting to use Update by Query with logstash to copy a field [scenario] from entries that share the same field [file]. I am implementing the solution described by badger in this thread: https://discuss.elastic.co/t/how-to-share-field-data-between-documents-of-the-same-file-path/250838
Badger's solution works like this:
- Configure the index with a boolean field called something like scenarioAdded that defaults to false.
- Run logstash with an elasticsearch input that fetches all records that have a [scenario] field and [scenarioAdded] set to false
- then feed those to an http output that makes an update-by-query call to elasticsearch to add [scenario] and set scenarioAdded to true for all documents with the same [file]
I am using this example and this example as templates but have several questions about how my logstash conf file and how to copy data from files with the same "file" in the query itself. My current .conf file looks like this:
input {
file
{
path => "C:/TestInputFolders/*/reports/logs/*.log"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
grok
{
match =>
{
"message" => ["%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}:%{SPACE}%{INT:threadNumber}%{SPACE}%{DATA:class}:%{INT:classLine}%{SPACE}-%{SPACE}%{DATA}scenario:%{SPACE}%{GREEDYDATA:scenario}",
"%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}:%{INT:threadNumber}%{SPACE}%{GREEDYDATA:class}:%{INT:classLine}%{SPACE}-%{SPACE}%{GREEDYDATA:errorType}%{SPACE}%{GREEDYDATA:errorInfo}"]
}
}
grok
{
match =>
{
"path" => "C:/TestInputFolders/%{DATA:folder}/reports/logs/%{GREEDYDATA:file}"
}
}
if ![scenario]{
mutate => {
add_field => {"hasScenario" => "false" }
add_field => {"scenario" => ""}
}
}
output {
http {
hosts => ["http://localhost:9200/index/doc/_update_by_query"]
http_method => "post"
format => "json"
}
}
and so far my query looks like this although I know it is currently incorrect(not sure how to get the [scenario] field from other entries with the same [file] field:
POST testLogs/_update_by_query
{
"script": {
"source": "ctx._source.scenario += params.scenario",
"lang": "painless",
"params": {
"scenario": ""
}
},
"query": {
"term": {
"hasScenario": "false"
}
}
There is a lot I don't understand about this method and would appreciate help implementing it. For one, where does I put my query? I see that there is a console to test it in Kibana but I imagine it must be saved somewhere. Second, how do I correctly point my inputs and outputs in the config so that I can output to http to do the update by query and then to my elasticsearch index once that is done? And third, how do I structure my query to copy the [scenario] field so that it is shared among all entries with the same [file] field? Any help is greatly appreciated!