No it's not. The goal of this API is to make backups in the most suitable format as possible, not to extract search response as CSV or JSON.
You'd better use logstash for that.
The specific use-case I don't want to state - however, each customer is split by tenant, and each tenant belongs to a different index which has a tenantId and a dateTimestamp.
Example Index: tid.555444333.2019-05-13
Each tenant will have a different time they want hot storage for their logs... So if it is 3 days, on the 4th day we will pipe their first days worth of logs into cold storage. I want it in a readable format so it can be sent to them upon request.
Any ideas?
I can query based on tenant and a timestamp, and grab all those logs associated with currentTime -3days (in this example)?
Hard to know for sure since you don't want to explain the use case fully and I don't know how much perfomance you're willing to sacrifice for getting the stuff back out of ES in a somewhat weird way.
Do consider that most of the time it makes a lot more sense to archive the raw events into readable formats at their point of entry compared to getting them back out from elasticsearch through search queries/full scan-dump/backups.
By that I mean something like logstash duplicating everything and shipping one copy into cold storage and the other copy into ES. Or whatever is receiving the events in the first place.
A nice example of that I have heard in recent history is a presenter at Elastic{ON} in SFO who explained how their logstash infrastructure duplicated everything to send 1 copy in AWS S3 and the other copy into ES.
Although I would strongly advise against re-extracting everything from ES the way you seem to want to do it, technically, it looks like you're looking for a tool I sometime use:
Your not giving any specific arguments to back why that would be the case so it's hard to argue anything without potentially putting my foot in my mouth. That being said, if it's done correctly, no.
Imagine a correctly designed duplicating, fanning out logstash infrastructure with on (fast) disk buffer queues or in memory buffer queues where one side of the fan out leads to a logstash cluster that store events in files on "storage" directly... and one side that leads to a logstash cluster that indexes events into an elasticsearch cluster.
Which do you think cost more in terms of performance or is slowest, indexing in ES or dumping to file on storage in a scalable fashion?
I believe that if you do it by ingesting in ES and then reading it all back out through queries this is by definition slowest and more costly than if you divert a copy directly to storage.
So it depends on how you do it and there are ways to do it very badly but with the right setup I think you would create more performance issues by doing backups via reextraction than by doing a proper fan out to storage of the original events.
Naturally, only writing the event to a disk is faster than indexing an event in ES, in almost all real world use cases. So I have no reason to assume your storage flow would be slower than your ES and that it would "slow" your event/sec perfomance... if done correctly.
I will put this forward to our Devs and work out the best solution... It makes more sense to do it your way, due to the eventual multi cluster, multi tenant environment that we will have. Especially due to the climbing number of "re-extractions" we will have to do daily to support the process.
The aim is to reduce cost in ES by reducing the time that the logs stay in hot-storage and therefore reducing the total size of cluster we require to store the logs.
I'm experimenting with the tool, but I am getting "security_exception" errors - I am assuming this is because I missed off the auth for ES. Any idea where I pass in a username and password for ES? I don't see any placeholds in the "elasticdump" file for them.
I do have have security enabled since it's licensed and the basic auth was working just fine for me.
It's essentially the same as trying to hit: https://user:password@production.es.com:9200/_cluster/health from your browser.
Does that work for you? I mean it doesn't if you try without user and pass from an incognito chrome window... and it does work if you enter user and password?
Only thing I see is that you're missing an equal sign after input... not sure if it supports both or not and you have no index in your input URL.
You're also in Elastic Cloud and I'm not, not sure if that changes something, I would bet no.
Do you have strange characters in your user or password and this is messing up the shell?
Sorry I don't know how to help you, it works for me.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.