Is there a way to reindex data in ES from start to now()?

fallenreaper · February 18, 2020, 4:07pm

I figured there was a built in way to say: Hey Master node, reindex on propertyA, propertyB, propertyC for independent faster lookups.

My ES DB is about 250g large now on a single dual purpose Master/Data node. I was having issues where kibana queries were taking > 30 seconds, so it sounds like it might be time to start optimizing.

While I can tweak my logstash instances to pass tweaked data, I was not sure if it is possible to have elastic go over the currently existing data and reindex by more keywords etc.

I can likely update my filter groks in logstash

filter {
	grok {
		match => [ "path", "%{GREEDYDATA}/%{GREEDYDATA:filename}\.txt"]
	}
	grok {
		match => {
			"message" => "%{DATA:sampleinfo}[:;]%{GREEDYDATA:backupinfo}"
		}
	}
	mutate {
		gsub => ["backupinfo", "[\n\r\t]", ""]
	}
}

But I wasnt sure if I can do this from within Elasticsearch.

Something like: Starting now(), reindex all X,Y,Z and turn it into A,B,C. I figured that when all logstash are updated they will start ingesting the correct information. So i would just need to do an update for all documents from: Oldest entry to now()

I will make a follow up post in Logstash on how to update logstash the things which need to be indexed for the fastest lookup are: timestamp, filename, sampleinfo, backupinfo

I presume there way was a way to redefine a property to have a different value for indexing. All the data are just variable character strings,

fallenreaper · February 18, 2020, 11:15pm

Can I execute a command in Elasticsearch to reindex everything while also ingesting new Data? I was not sure if i could execute a function which will reindex a property from X type to Y type? I presume there is a way I can define the properties (sampleinfo and backupinfo and the timestamp bucketing) which are stored and reindex them to something else for easier lookups?

spinscale · February 19, 2020, 2:27pm

Take a look at the reindex API, which can be run while new data is being indexed.

fallenreaper · February 19, 2020, 4:38pm

I might be using the wrong words here. It seems that the API is like moving data from Cluster "foo" to a new cluster "bar" I think? I think this will be useful later for sure.

I was thinking that the way some of my variables being defined may be why my when querying, the resultset is really slow to fetch. I was thinking there may be a way to update the property type by doing some command such that when querying, it would find results faster?

Im not sure how ES handles typings, since I want to streamline the ES instance Specifically around the config variables: timestamp (built in), filename, sampleinfo, backupinfo

spinscale · February 20, 2020, 7:42am

Hey,

reindex from remote is only. a part of reindex, you can also specify another index within your cluster.

system · March 19, 2020, 7:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch reindex range Elasticsearch	1	393	May 12, 2020
Data Correction Elasticsearch	7	606	February 10, 2021
Elasticsearch reindix range Elasticsearch	1	292	May 15, 2020
When do I need to reindex data? Elasticsearch	4	376	July 6, 2017
How to "recreate" events from an index by just changing time? Elasticsearch	2	296	December 7, 2020

Is there a way to reindex data in ES from start to now()?

Related topics