Store XML file on S3 and send specific XML TAGs to ElasticSearch to index

(Marcelo Costa) #1

Hi folks

Can I use LogStash to sent only some TAGs from an XML file and store the physical XML file on an S3?

                                  |----> ElasticSearch

XML --> LogStash -->|
|----> S3

From the XML file, I need sending to ElasticSearch index only some specific TAGS.

To the S3 storage, I need send the complete XML file. In that case, I can not transform the XML file to JSON due to a government rule. I need store this file in the original XML format without changes.

The XML file is in a directory on my disk local.

I need store this files on an S3 due to when requested, the user can search on ElasticSeacrh and download the physical file from S3 by a Webpage that I will create.

Anyone with a similar situation?

My output on LogStash Conf file:

output {

#testing with a local file

codec => rubydebug



hosts ="MY ElasticSearch Host";

user ="elastictem";

password ="My Password";

index ="xmlTest";



access_key_id ="My Access Key";

secret_access_key ="My Secret Access Key";

region ="us-east-1";

bucket ="Testxml1";

time_file = 1


(Mike Barretta) #2

Maybe checkout the clone filter.

If you clone incoming docs and use add_tag to differentiate them (e.g. `add_tag => "original"), then you can have conditional logic in your pipeline to do different things with each: original goes to s3 and the other is stripped of unnecessary bits and output to Elasticsearch.

(system) #3

