How to upload a file into ElastiSearch

noor_basha · August 3, 2020, 2:37am

Hi I am new to ELK,
I need to upload different types of files into ELASTIC SEARCH, is it possible to do that. if it is anyone can please help me how to do with sample code. As of now, I am doing with reading the file and sending the data. out of 6k files of one zip file, I am able to send only around 3k. But I need to attach a file directly without reading its content.

warkolm · August 3, 2020, 4:34am

Welcome to our community!
What sort of files are they?

noor_basha · August 3, 2020, 4:44am

Hi warkolm,
Thanks for the replay.
It is a LINUX SOS REPORT zip file consist of so many files nearly (11k files) with different type of extensions like txt, conf, rules, xml, File, CRON ,CNF File, CF File, REPO File, ALLOW File, CFG File etc..

warkolm · August 3, 2020, 4:45am

You could use Filebeat for most of it, but it would require a bit of configuration so that it'd extract the right patterns.

noor_basha · August 3, 2020, 4:48am

Sorry i don't know much about ELK just few days back only i moved to this project, can't we strore it as a blob ?. I need to store all 11k files and again i need to get it back.

warkolm · August 3, 2020, 4:51am

No, Elasticsearch is not a binary store. Everything is converted to json.

noor_basha · August 3, 2020, 4:52am

Ok, so Filebeat is the only way to store all these files?

warkolm · August 3, 2020, 4:53am

It sends the data to Elasticsearch.

Check out https://www.elastic.co/products/ for a bit more info on everything.

noor_basha · August 3, 2020, 4:54am

Ok thanks warkolm for your time.

noor_basha · August 3, 2020, 11:56am

HI warkolm

I am iterating all files and reading each content and creating a document in elastic search. out of 6k files i am able to read and upload only 3k for first time. each and every run the no:of uploading files decreases. in different different platforms it is inserting different number of files.
here is my sample code:

def upload_file_to_es(name='',path=''):
if(name is not ''):
try:
try:
with open(os.path.join(path, name),'r') as file_obj:
log_file_string = file_obj.read()
except:
with open(os.path.join(path, name),'rb') as file_obj:
log_file_string = file_obj.read())
file_path = os.path.join(path.replace("C:\Users\Manikindi_shaik_Noor\Music\100.64.24.138_2020_Apr_15_07_10\nts_sles12_base_200415_0259",""), name)
upload_data={}
upload_data["path"]=file_path
upload_data["file"]=name
upload_data["content"]=str(log_file_string)#json.dumps(log_file_string)
upload_data["ip"]="0.0.0.3"
upload_data["exec"]="execution-ghi-013"
upload_data["timestamp"]=datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

        try:
            resp_elastic = requests.post(
                elastic_url,
                headers=headers,
                data=json.dumps(upload_data),
                verify=False
            )
            print("ES Entry completed for {}".format(file_path))
            count21+=1
            print("file number %s"%count21)
        except Exception as e:
            print("ERROR: %s"%e)
            print("ERROR: FILE %s"%file_path)
            
    except Exception as e:
        print("ERROR: %s"%str(e))
        print("ERROR: FILE %s"%file_path)

each and everytime the no:of files entry is different to ELK any suggestions please

tomrade · August 9, 2020, 6:54pm

I see you are using requests there , I would recommend the python library (and in particular the streaming bulk helper https://elasticsearch-py.readthedocs.io/en/master/helpers.html#example)

As stated elasticsearch will not do well with the file contents you are likely finding indexing errors on files as elasticsearch has guessed the field mapping (based upon the first doc) and some file contents will not be valid for that field mapping, You can ingest it as a blob(base64 it in your python first) with a non analysed field mapping if you really need to store them (you cannot search non analysed fields however).

Easiest way is using an index mapping template
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html

You need to enabled to false as so
https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html

This is a bit of hacky workaround, I really recommend storing the data outside elasticsearch though and just ingesting a link to the file ie in s3/unc or something like that. Its not much more code to upload the file in python to something in then ingest the link for search purposes.

Offtopic
the "Correct" way to do this is to use something like tika https://tika.apache.org/ to parse metadata from the contents and then ingest that in a uniform manner.

dadoonet · August 9, 2020, 9:35pm

You can use the ingest attachment plugin.

There an example here: https://www.elastic.co/guide/en/elasticsearch/plugins/current/using-ingest-attachment.html

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my_index/_doc/my_id

The data field is basically the BASE64 representation of your binary file.

You can use FSCrawler. There's a tutorial to help you getting started.

system · September 6, 2020, 9:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Upload Large Log File Kibana	2	1517	March 12, 2022
How to send a text file to elasticsearch with rest api Elasticsearch	2	1250	October 15, 2019
Input elasticsearch log files into ELK Elasticsearch	4	414	December 3, 2018
To store and search a File content using elastic search Elasticsearch	12	3218	July 13, 2020
Indexing files (docx, xlsx etc) Elasticsearch	2	861	July 6, 2017

How to upload a file into ElastiSearch

Related topics