Hi,
I am currently developing a web site in .NET + SQL Server and in order to
make search much faster, I would like to use Elastic Search.
I am totally new in ES and at the moment, I am only validating the concept.
This is why I would like your expert advise.
Let me first talk about my data structure (in SQL Server):
The database is of course relational and made up of 100+ tables.
Data is linked to 2 main topics: advertisers and products.
An advertiser might have 1 to many business fields and sell many products.
A product might be linked to dates of availability.
The website is multi-lingual (7 languages).
Simplified schema would be (for the purpose of the explanation, I
simplified it):
Table "tbl_advertisers" => records all information linked to an advertiser
for a specific business field
Table "tbl_products" => records all information linked to a product,
for a specific advertiser
Table "tbl_availability" => records list of dates between 2 limits and
for each day, a record mentions whether the product is available or not and
its daily price.
Table "tbl_texts" => records the texts linked to either an
advertiser or a product (1 record per language, advertiser business field
OR 1 record per language and product). Each record might be up to 6000
characters long. 1 record per language.
Table "tbl_addresses" => records the addresses for both advertisers and
products
Question #1 ( SQL Server + Elastic Search : synchronization)
Today, in order to make sure the database is always consistent, when I need
to apply any kind of modifications (insert, update, delete) to an
advertiser or a product, all changes are done within 1 single Stored
Procedure.
Thanks to the global transaction at the stored procedure level, I am always
sure that when a user runs a query against the web site, results which are
returned are always in-sync with the database.
Since I would need to apply the updates in Elastic Search as well, I intend
to complement the stored procedure with HTTP Post requests towards
ElasticSearch (as many as necessary, seeing the 8K limitation for each HTTP
Post).
Therefore I would end up with something like:
Begin Transaction Main
Begin Transaction SQLServer
Commit Transaction SQL Server
-- Do all the HTTP Post requests towards Elastic Search
Commit Transaction Main
This way if something goes wrong with ES, I simply rollback (however this
solution would not prevent from having some lack of synchronization (case
of network failure during one POST, or failure at ES level) except if there
exists any kind of transaction mechanism in ES as well --- but is there
any??).
I read about the River but this will lead to a delay between the SQL Server
database and Elastic Search, plus additional risk of de-synchronization if
something wrong occurs during the update of Elastic Search.
==> What would be your advice as regard all this?
Question #2 (how to structure the information in Elastic Search)
At first, I thought of considering one document to store all information
linked to advertisers, and one document for the products.
Both documents would also contain the information linked to addresses.
(a) But what about the texts? Would you create 1 field per language or
create another document to store the texts?
(b) What about availabilities? How would it be possible to link the
products with the availabilities?
For (a), I thought of sending 1 HTTP Post (see Question #1) to update (or
create) the advertiser or product, then 7 HTTP Posts to update and save the
texts (1 post per language in order not to go beyond the 8K limitation).
Additional information, when the user is querying the website, he might
look for advertisers only, products only but also everything (products and
advertisers) that match the search criteria.
==> What would be your advice as regard all this?
In advance, many thanks for your help.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.