What would be the best way to store and query large binary files in ES?


(ankur) #1

Hello all ,

I want to store large binary files (like xyz.exe, pqr.exe) into ES and search/query for particular binary patterns into these files.

e.g
data = 4DBEEF3B305367C9AEB8DBA22E3CD0ADB8EBB75C829E68DD8EEDDBA5EE5301FC2A2E087C952D325124F3B62AD548D7CD6C2C633F10E57686C6AE288476C0849985F3BE6C7CF19A48992FA121845FA3

search/query for = 829E68DD8EEDDBA

TO STORE

  1. I am trying make use of attachment plugin to store its base64 encoded data.
    {
    "test": {
    "mappings": {
    "logs": {
    "properties": {
    "file": {
    "type": "attachment"
    }
    }
    }
    }
    }
    }

ALSO

  1. I am trying to store data as a string ( 4DBEEF3B305367C9AEB8DBA22E3CD......) in one field.
    {
    "test": {
    "mappings": {
    "logs": {
    "properties": {
    "file": {
    "type": "string"
    }
    }
    }
    }
    }
    }

TO SEARCH

  1. I am trying "wildcard" search/query.
    GET test/logs/_search
    {
    "query": {
    "wildcard": {
    "file" : "829E68DD8EEDDBA"
    }
    }
    }

ALSO

  1. I am trying "query_string" search/query.
    GET test/logs/_search
    {
    "query": {
    "query_string": {
    "query" : "829E68DD8EEDDBA"
    }
    }
    }

I have managed to get expected results but these queries are taking very LONG time on big index.

What would be the alternative way to this use case.
Should i go for "N-Gram" analyzer for this use case or any other way to make search faster??
I am planning to store 1 Million binaries each of size (1KB to 1 MB).

Any help/hint would be appreciated.

Thanks.
Ankur Mathur


(system) #2