How to handle case when the INDEX value is not included in the "path" value?


(chun ji) #1

Hi there,
I am working on a Elastic project that generally of this model:

" Several LogStash Shippers that resides on respective hosts <==> one single Redis Server <==> multiple LogStash Indexer <==> one ElasticSearch Server
".

As we understand, every piece of log message should come with an "index string", where this message would be registered of this index at ES side. But what happen if such index value is not included in the "path" value. For every Shipper host, this very unique value is within a property file.

How can we pass this value with every line of log message to the ES server ?

Thanks a lot.

Chun


(Mark Walkom) #2

What do you mean with this?


(chun ji) #3

Hi All,
Our system is used by multiple users every day to submit different jobs. And when one job is being processed, we would like to have logsatsh to do a real-time analysis if some log message is raised. And if it does, we would have such log message indexed per the current job-id.

As you can see the sample I put at the buttom. This is type of message the "Indexer Logstash" is given, where the "path" does not carry the job-id value. The actual value is kept in a property file. I have followed this link,
"


",
to have the value set in my env, and have the shipper.conf read. And that is why you see the "tag" is giving the unique job-id (36293300) now.
"
{
"path" => "/user/logs/work/TOP/SFO/aaa/bbb/logs/server.out",
"message" => " my log messages here ..."
"type" => "sjc",
"tags" => [
[0] "36293300"
]
}
",

So overall, this problem has been solved and this news group is very helpful for us.

Thanks a lot for the help.

CJ


(chun ji) #4

Hi All,
I have 2nd question that is related to this. When each job is started by per user, an index per this job is created now. But other than that, I have seen a different index id involved.

here is the example of my index list when I run the index command: " curl -X GET "http://ES_server:9200/_cat/indices" | sort ",

And here is the output I got:
"
....
yellow open job-36416492 5 1 379772 0 57.6mb 57.6mb
yellow open job-36418546 5 1 183099 0 40.7mb 40.7mb
yellow open job-36418556 5 1 145342 0 35.2mb 35.2mb
yellow open job-%{jobid} 5 1 8842147 0 1.4gb 1.4gb
yellow open .kibana 1 1 2 2 10.7kb 10.7kb
".

I don't know why this entry "
yellow open job-%{jobid} 5 1 8842147 0 1.4gb 1.4gb
",
And it seems to me this entry grows per every job submitted ?

Thanks for the help.

CJI


(Christian Dahlqvist) #5

That index is created for events that does not contain a jobid that can be used by logstash to create the index name. Creating very small indices per jobid is not recommended as each index and shard uses comes with a certain amount of overhead. Having a large number of very small shards in a cluster is very inefficient and this approach will scale badly.


(chun ji) #6

Hi Christian,
Thanks for the response.
Here is our scenario, Each of our job contains hundreds of thousands of lines of log messages from different server locations. We would like to have job's logs indexed under one index, and later on we could provide some quick response per client's "job requirements".

Anyway, back to that strange "job-%{jobid} " index I have noticed. I have tried different ways to see how it can be solved, and here is my what I found so far.

In my logstash indexer config file, I have this chunk of setting.
...
filter {
grok {
match => {
"message" => ["%{Customized_PATCH:some_pattern1_here}",
"%{Custoemized_FAILURE:some_pattern2_here}"]
"tags" => ["%{NUMBER:jobid}"]
}
...
}
".
This tag value is created at shipper level and indexer to have it extracted for creating the unique index name.

What I have found is,

  1. if the tags part is defined after message part, this extra "job-%{jobid}" index will be generated.
  2. if the tags part is defined before message part, there is no more "job-%{jobid}", but customized pattern will not work. it was totally ignored even if there is a pattern match in the log message.

?? Any way to have both working.

Thanks a lot for the help.

Chun


(chun ji) #7

I have found the answer to this question, that I have to keep them in separate grok and both works fine. Something as:
"
filter {
grok {
match => {
"message" => ["%{Customized_PATCH:some_pattern1_here}",
"%{Custoemized_FAILURE:some_pattern2_here}"]
}
} // end of 1st grok
grok {
"tags" => ["%{NUMBER:jobid}"]
} // end of 2nd grok
} // end of filter.
"


(Christian Dahlqvist) #8

How many jobs do you expect to have indexed in the cluster at any point in time?


(chun ji) #9

It is still in the trial stage. If everything was stable, we are expecting 100 to 150 jobs being indexed every 24 hour. We may also need to have an index de-list logic, as we are only interested of those, that are created for the past 3-4 days. In our current design, every shipper is started in parallel with the running job on its own host.


(Christian Dahlqvist) #10

That will leave you with approximately 500-600 indices in the cluster if I understand you correctly. That may, depending on the size of the nodes and cluster, be OK, but as your indices are so small I would recommend configuring each of them to have 1 shard instead of the default 5.

As said earlier, if you intend to increase this going forward this approach will not scale terribly well.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.