Including data from SQL tables to Elasticsearch

Good morning. I would like to know if it would be possible, in order to increase the performance of queries in a SQL database, to perform a data dump from the SQL database to elasticsearch, is that possible, or is it only possible to include data in non-SQL format? Thank you

You need to send your existing data to elasticsearch.
That means:

  • Read from the database (SELECT * from TABLE)
  • Convert each record to a JSON Document
  • Send the json document to elasticsearch, preferably using the _bulk API.

Logstash can help for that. But I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.

I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/

Have also a look at this "live coding" recording.

Thank you for your quick reply David.
I just need to know how I should treat the relationship between tables when entering the data in elasticsearch, I have seen that there is a field called "nested" but it is not entirely clear to me in which cases to use it... According to what I saw, when you want to query an array of objects associated to an object, you have to use the "nested" field so that when you flatten the array, the existing relationship between the data is not lost, is that right? If so, for example, if I wanted to store in elasticsearch an object that represented a house and in that object an array of objects that included the different appliances it includes (and their technical specifications) so that I could later search for those houses that had an appliance of a specific year, should I include the array of appliances as an object of type "nested" when declaring the index?
If so, would it be counterproductive to use the "nested" type versus a simple "object" type in the future if the volume of data increases too much?
Thanks in advance for your answer.
Best regards!

I'd totally forget about the existing model and would just think about the usage to build the right "search" objects for my use case.
Here I can't really answer as I don't know the use case.

Basically I'd recommend to ask yourself 2 questions:

  • What kind of objects my users want to get back as a response? If it's object X, then just index object X
  • What typical attributes my users want to search for? Let say I need attribute a, b and c, just index those attributes within object X whatever the original source of those attributes is.

Most of the time you will need to denormalize your data. Doing joins at index time is one of the key.

Ok, thank you very much for your answer David, you have cleared up my doubts. Best regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.