Split data into multiple indices automatically on insert

edeak · January 20, 2016, 12:35pm

Hi guys,

what I like to do is that when I insert my data which has a timestamp field, my data is automatically should be splitted across weekly indices and if it's necessary, a new index with pre-defined mapping should be created.

Is there a way to do that whith ES config or I have to do it in business logic outside elasticsearch?

tommoholmes10 · January 20, 2016, 2:12pm

If you use multiple node in server, elasticsearch automatically split, you only configure same cluster_name

edeak · January 20, 2016, 3:03pm

Thanks for your reply. Maybe you misunderstood me.

Lets say I have data from 2016.01.01 to 2016.01.20.
When I start to insert my data I want that ES creates an index named mytype-20160101 and it stores all the data where the timestamp field is less than 2016.01.08. 00:00:00.
Next, it should create a new index named mytype-20160108 and it stores all the data where the timestamp field is between 2016.01.08 00:00:00 and 2016.01.15 00:00:00
... and so on.

If I use logstash for inserting data, can I achieve this?

magnusbaeck · January 20, 2016, 7:42pm

When I start to insert my data I want that ES creates an index named mytype-20160101 and it stores all the data where the timestamp field is less than 2016.01.08. 00:00:00.

ES won't do this automatically. You have to tell it which index to store a new document.

If I use logstash for inserting data, can I achieve this?

Yes.

edeak · January 21, 2016, 9:00am

Thanks for your reply Magnus. I'm trying to import my data with logstash, and I'm trying to import my data into monthly created indexes. I have a date field called createDate and thats what I want to use.

Here's my config:

input {
    jdbc {        
        jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"        
        jdbc_user => "user"
		jdbc_password => "pass"        
        jdbc_driver_library => "mysql-connector-java-5.1.38-bin.jar"        
        jdbc_driver_class => "com.mysql.jdbc.Driver"
		jdbc_paging_enabled => "true"
		jdbc_page_size => "2"
        # just for testing
        statement => "SELECT * FROM mytable skip 0 limit 10"
    }
}

output {
    elasticsearch {
		hosts => ["localhost:9200"]		
		# here's the trick...
		index => "my-index-{myformatteddate}"
		document_type => "mydoc"
	}
}

What filter should I apply and how, to create a new field myformatteddate which contains the year and month of the createDate?

magnusbaeck · January 21, 2016, 6:19pm

One typically uses the date filter to parse date strings and convert them to ISO8601 format in UTC. Then you can use the %{+YYYY.MM.dd} notation in the index name to have Logstash insert the date from the @timestamp field into the index name. Exactly what the date filter should look like depends on what the createDate field looks like. I haven't used the jdbc input so I'm not sure what happens with date fields. Use a stdout { codec => rubydebug } until you get what you want. Don't jump into Elasticsearch prematurely.

Topic		Replies	Views
Elasticsearch index name question Elasticsearch	4	345	July 6, 2017
Creating index dynamically in ES Elasticsearch	6	2788	July 6, 2017
Mapping Clarification Elasticsearch	7	345	July 6, 2017
Logstash is not creating new index in elastic cloud Logstash	19	2687	November 9, 2017
Time series data, use template to auto-create monthly Elasticsearch	6	3039	July 5, 2017

Split data into multiple indices automatically on insert

Related topics