Time taken by ELK stack to create indices

How much time does it take to create indices using ELK services via docker-compose file?
We process total 450 files of total size of 4.6GB to create 450 indices. All these files are basically the log files. It takes around 8 hours to ingest the data from directory to ELK stack indices creation.
The server we are using is 16Gb RAM with 2T hard disk but still it is taking a lot of time to create indices.
Is this figure correct? Does ELK takes time to create indices? How much time thus 1 GB takes to create indices? How it can be optimised?

Creating an index per file does not seem optimal. Generally indices are create by file types instead as having lots of small indicxes and shards is inefficient and can cause problems down the line.

How are you ingesting the data? Are you using Logstash or Filebeat?

In order to optimise indexing performance I would recommend looking into this official guide.

Hi Christion,
Thanks for reply.
450 files means 450 different directories having different type of log types in it and these log types has log files. We are making ingesting using Logstash.
We allocated more memory to these docker containers but still total 15Gb of these 450 directories taking 6-7 hrs of time for data ingestion.
Best way to optimise it?

Look at the link I provided in my previous response. This is a great starting point.

I am not sure I understand this. Typically indices are designed to hold data of specific types and not different types of data from a certain location. The only reason I could see for the approach you have taken is if each directory belongs to a different user and you want to control access at the index level. Is that the case?

Yes I am looking to the link.
450 directories are basically 450 different machines and each having similar kind of log files in it. Just all these machines are indexed separately to search query the files inside the log files and find some patterns present in each kind of machines.

That does not make sense to me. I would recommend you instead index into a single index and add data about the host or directory onto the events so that you easily can filter on it in Kibana or queries. Indexing into a large number of indices will be inefficient and slow down ingestion. The thing that makes bulk indexing fast is that groups of documents are sent to each shard and indexed together, reducing overhead per document. If a bulk request contains documents for a very large number of shards the groups of documents sent to each shard is likely to be small and you get much more overhead per document.

I have seen users collect data from thousands of machines into a single index and then filter the way I explained. Your approach does not scale and may not perform very well either as it may result in a lot of small indices.

The shape of our data structure:
Directories A, B ...116k with unrelated to one another

  1. A
    a) RS log
    b) ME log
    c) CH log
  2. B
    a) RS log
    b) ME log
    c) CH log
  3. ...go on till 16k
    We are doing indexing for each directories and making indices as A, B, C, .. 16K

I believe you are telling to create one index only and put all log files into different events within that same index. Can you send some link for this kind of approach. Any example or link to look upon and implement.

That is the standard/default approach and would be the default configuration e.g. if you used filebeat to directly index into Elasticsearch. As it is the default I have not found anything in the standard docs that explicitly describe it. This blog post contains some explicit guidelines that are largely still applicable even though the post is old and a lot of improvements have been made in recent versions, e.g.:

TIP: In order to reduce the number of indices and avoid large and sprawling mappings, consider storing data with similar structure in the same index rather than splitting into separate indices based on where the data comes from.

This section in the docs also contains some guidelines.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.