Hi,
I am a newbie in the ES world with just a few days of reading and some
hands on.
I have a scenario where i have a huge XML which contains data customers.
The data can logically be split into sections like Profile Details, Contact
Details, Banking Details etc..
What would be the best indexing strategy for such a document ?
Should i split the logical chunks into separate indexes ? (eg: Index
for Personal details, Banking details etc) and use hypermedia links for
linking these documents back ?
Or should i just have one document (the same xml which i receive) and index
the fields and play with the boost factor ?
My end goal is to make search as fast as possible.
Could anyone help me with the best approach in such a scenario ?
It would help if i am pointed to any blogs / literature if such a thing has
been discussed before.
If you expect your search to work across the entire document always then it
probably makes sense to use one index and shard it for efficiency. If you
know you might search on only a certain section of the document then you
can break create indexes to match the sections I think.
Thanks for the response Hawk Eye.
The requirement is search can be done on multiple fields eg: Customer Name,
Passport No, his address, email address etc..
But the result of any search should be returning the entire document back.
Given this scenario, would it make sense to have multiple documents, and
then write a wrapper which will do the merge of these docs to return one
document ?
regards,
-Parag
On Wednesday, 20 June 2012 21:07:42 UTC+2, Hawk Eye wrote:
If you expect your search to work across the entire document always then
it probably makes sense to use one index and shard it for efficiency. If
you know you might search on only a certain section of the document then
you can break create indexes to match the sections I think.
Why not index each custom data as a document? Thats what you are after, no?
Search by any details of a customer (contact, profile, ...) and get back
the customer?
Thanks for the response Hawk Eye.
The requirement is search can be done on multiple fields eg: Customer
Name, Passport No, his address, email address etc..
But the result of any search should be returning the entire document back.
Given this scenario, would it make sense to have multiple documents, and
then write a wrapper which will do the merge of these docs to return one
document ?
regards,
-Parag
On Wednesday, 20 June 2012 21:07:42 UTC+2, Hawk Eye wrote:
If you expect your search to work across the entire document always then
it probably makes sense to use one index and shard it for efficiency. If
you know you might search on only a certain section of the document then
you can break create indexes to match the sections I think.
Hi,
Thanks a lot for the response.
To make my query clear, lets look at the XML below
https://lh6.googleusercontent.com/-hX1frBZvOXE/T-SKtpszUhI/AAAAAAAAHfs/pbegB1SCZqg/s1600/Structure.png
Here, i can logically split the XML into 3 parts,Personal Details, Address
and Banking Details.
In the real world, i should be in a position to search for any of the
fields.
If i index everything as a single document then, when i do a
search, i can return the entire document back with almost no overhead.
If i split the documents into multiple docs (Personal Details, Address and
Banking Details), then i need to add some complexity to:
Split the XMLs into smaller docs when indexing
Maintain links between them so that i know this doc is related to so and
so customer
On a search, my requirement is to return the original document
back. It means i need to add the complexity to merge the
subparts of the doc and create the document back.
Based on this scenario, would you still recommend we should split it into
multiple documents ?
regards,
-Parag
On Thursday, 21 June 2012 09:10:55 UTC+2, kimchy wrote:
Why not index each custom data as a document? Thats what you are after,
no? Search by any details of a customer (contact, profile, ...) and get
back the customer?
Thanks for the response Hawk Eye.
The requirement is search can be done on multiple fields eg: Customer
Name, Passport No, his address, email address etc..
But the result of any search should be returning the entire document
back.
Given this scenario, would it make sense to have multiple documents, and
then write a wrapper which will do the merge of these docs to return one
document ?
regards,
-Parag
On Wednesday, 20 June 2012 21:07:42 UTC+2, Hawk Eye wrote:
If you expect your search to work across the entire document always then
it probably makes sense to use one index and shard it for efficiency. If
you know you might search on only a certain section of the document then
you can break create indexes to match the sections I think.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.