Elasticsearch cluster / nodes / shards conf


(Thiago) #1

Hi,

I need to import around 400GB of a json log file (which has about 300 million records) in order to make some searches and visualization using kibana.
I did some research in order to understand how is the better config setup for large cases as it is, but I would like some help.
I am planning to using only one server for that.

Thanks


#2

You should be able to start Elasticsearch and Kibana with default settings.

To ship the JSON to Elasticsearch you can use Logstash or Filebeat.

In Logstash you would use a file input and Elasticsearch output.

Or if you use Filebeat you would configure a log input and Elasticsearch output.


(Thiago) #3

Thanks @A_B.
I have a logstash conf file ready to start the import, but I was just wondering to understand better about the cluster/nodes before start - in order to avoid any crash during the export (as it is millions of records).


#4

First test with maybe 10 or 100 logs. If everything looks good, then start the "real" import :slight_smile:


(David Turner) #5

It will take some experimentation and tuning to be sure that your setup has the performance characteristics you need. You might like to try importing increasingly large subsets of your log file first to get a handle on the performance characteristics and make sure that your mappings are set up suitably for the searches you want to perform.

If the index will eventually be 400GB then this article suggests you will want to split it into around 10-20 shards. However if you do not need all the fields to be indexed then you might find your index becomes much smaller than the source data, so you will be able to work with correspondingly fewer shards.

If your searches will be filtering on time ranges (common for searches of log data) then you might want to consider splitting the data into time-based indices rather than putting it into a single index with lots of shards. Using time-based indices will allow Elasticsearch to completely avoid searching any shards which it knows in advance do not contain any documents that match the time range specified in the search.


(system) closed #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.