we are currently running an Elasticsearch cluster with a fairly complex enrichment pipeline.
The enrichment at the moment runs as a separate Python process, that pulls data from ES< processes it and puts it back.
The process performs different enrichments, such as splitting URLs into components, IP geolocation using maxmind's DB, MAC address' vendor resolution etc., all exploiting external data sources.
With the ingest nodes capability coming in ES 5, I would be interested to move all the enrichment part in an ingest node. However, not all of the functionality we implemented is currently available in an ingest pipeline.
My question is: if I had to write a script using the lang-python module to implement the missing functionality, for example URL splitting and MAC vendor resolution, would it be possible, from within the script to:
import python modules (with the usual 'sys.path.append('/module/path/')' and where should I put the modules code? $ES_HOME/config/script/modules ?
open and read files on disk, and where should I put them?
access external data sources, for example through an HTTP REST call? Or is the sandboxed code somehow forbidden to open connections?
Thanks in advance for any advice, I tried to search for this info, unfortunately results get mixed with questions about the python ES client library.