Elasticsearch cluster / nodes / shards conf

thiagotankian · January 15, 2019, 12:39pm

Hi,

I need to import around 400GB of a json log file (which has about 300 million records) in order to make some searches and visualization using kibana.
I did some research in order to understand how is the better config setup for large cases as it is, but I would like some help.
I am planning to using only one server for that.

Thanks

A_B · January 15, 2019, 3:35pm

You should be able to start Elasticsearch and Kibana with default settings.

To ship the JSON to Elasticsearch you can use Logstash or Filebeat.

In Logstash you would use a file input and Elasticsearch output.

Or if you use Filebeat you would configure a log input and Elasticsearch output.

thiagotankian · January 15, 2019, 4:17pm

Thanks @A_B.
I have a logstash conf file ready to start the import, but I was just wondering to understand better about the cluster/nodes before start - in order to avoid any crash during the export (as it is millions of records).

A_B · January 15, 2019, 4:25pm

First test with maybe 10 or 100 logs. If everything looks good, then start the "real" import

DavidTurner · January 15, 2019, 4:58pm

It will take some experimentation and tuning to be sure that your setup has the performance characteristics you need. You might like to try importing increasingly large subsets of your log file first to get a handle on the performance characteristics and make sure that your mappings are set up suitably for the searches you want to perform.

If the index will eventually be 400GB then this article suggests you will want to split it into around 10-20 shards. However if you do not need all the fields to be indexed then you might find your index becomes much smaller than the source data, so you will be able to work with correspondingly fewer shards.

If your searches will be filtering on time ranges (common for searches of log data) then you might want to consider splitting the data into time-based indices rather than putting it into a single index with lots of shards. Using time-based indices will allow Elasticsearch to completely avoid searching any shards which it knows in advance do not contain any documents that match the time range specified in the search.

system · February 12, 2019, 5:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Logstash importing very large JSON files Logstash	4	4815	July 6, 2017
Bulk Import to Elasticsearch Elasticsearch	6	2076	December 5, 2017
To give input to ELasticsearch via logstash Logstash	5	1260	July 6, 2017
How should I configure ELK stack for saving logs everyday? Elasticsearch	2	395	August 21, 2018
Best way of importing very large JSON files into Elasticsearch Elasticsearch	1	1158	June 6, 2019

Elasticsearch cluster / nodes / shards conf

Related topics