Question about elasticsearch index and logstash ingestion

Hanni · June 2, 2023, 8:32am

Hello everyone
I'd like to ask you 2 questions.
I receive approximately 270 csv per month, 9 of them per day. Each csv is between 1kB and 3MG in size. All these csv are sent to the same index on elasticsearch with logstash. I would like to know what the maximum storage capacity is for an index in elasticsearch.
I would also like to know how to automate the import into elasticsearch via logstash so that each time a file comes into the directory it retrieves it without retrieving those that were already there.

dadoonet · June 2, 2023, 8:51am

The sky is the limit...

More seriously, the limitation will come from your hardware most likely.
As Elasticsearch is elastic, it means that you can scale out (add more machines) to add even more space...

But I'd recommend not going above 50gb per shard.

Depending on your use case, you should may be consider using datastreams.

Hanni · June 2, 2023, 9:00am

How can I shard my index with logstash?

dadoonet · June 2, 2023, 10:49am

You can define the number of shards when you create the index in Elasticsearch or using an index template.

Hanni · June 5, 2023, 7:55am

But I am not creating my index with elasticsearch. I create it with logstash

dadoonet · June 5, 2023, 8:20am

You need to create an index template in Elasticsearch first.

Hanni · June 5, 2023, 9:48am

Do I need to create the index and specify the column names?

dadoonet · June 5, 2023, 10:06am

You need to create an index template.

You can leave elasticsearch guess what you want to do with the "columns" (we are saying "fields" in Elasticsearch vocabulary), but I'd definitely recommend to define this before the first document is indexed.

Something like:

PUT _index_template/csv-template
{
  "index_patterns": ["csv*"],
  "template": {
    "settings": {
      "number_of_shards": 1
    },
    "mappings": {
      "properties": {
        "my_field1": {
          "type": "text"
        },
        "my_field2": {
          "type": "date"
        }
      }
    }
  },
  "priority": 100
}

Source:

This example supposes that you configure logstash to send the data to an index name starting with csv...

Hanni · June 5, 2023, 1:00pm

Thank you . But I don't understand this line : "index_patterns": ["csv*"]

leandrojmp · June 5, 2023, 1:05pm

This means that the template will be applied to every indice that starts with csv, so in your logstash output you would configure your output index to csv-something for example.

It is explained in the already shared documentation.

Hanni · June 5, 2023, 2:39pm

got it. Thank you

warkolm · June 6, 2023, 3:47am

What sort of data are you ingesting? Is it time based?

Hanni · June 6, 2023, 8:08am

Yes!

Hanni · June 6, 2023, 8:10am

Hello, I have a question.
After creating my index model I'd like to send my data to an elasticsearch index. In my context I receive data every day. How can I automate the logstash task so that it can send new files arriving on elasticsearch every day?

leandrojmp · June 6, 2023, 3:31pm

The best way is to run Logstash as a service and configure your input to read files on an specific folder, when a new file appears on this folder logstash will automatically read and process it.

How to run logstash as a service depends on your operating system, but just follow the documentation.

warkolm · June 6, 2023, 11:20pm

Make sure you are using ILM then.

Hanni · June 7, 2023, 8:51am

When I create my index model, if I set the number of shards to 2, what happens when I send my data to the indexes?

Hanni · June 7, 2023, 1:52pm

Hello,
I have one more question.
Let me put it in context:
I've created an index template in elasticsearch. Then, using logstash, I created an index that follows this template. Before I can visualize my data in kibana, I need to create a data view from this index. Every day, a script is launched and logstash sends new data to my index. My question is: I'd like to know if every time I receive data in my index, my data view created from this index updates automatically, as well as the data in my dashboard.

dadoonet · June 7, 2023, 2:55pm

The data in elasticsearch will be seen by Kibana.
As long as the data view corresponds to the same index, alias...

Hanni · June 14, 2023, 9:08am

Thank you

Topic		Replies	Views
Envío de archivo CSV a elasticsearch Logstash	2	427	June 7, 2019
Elasticsearch index storage size Elasticsearch	2	586	November 22, 2019
Logstash to ElasticSearch Throughput Logstash	6	1690	April 28, 2017
Best method - Importing 50x10gb CSV files into Elasticsearch on GCE Elasticsearch	6	8920	July 6, 2017
Advice on Scaling writes Elasticsearch	23	2281	August 17, 2021

Question about elasticsearch index and logstash ingestion

Related topics