General question about indexing


(igor.o) #1

Let's suppose that I have two tables in my project's DB (today it's SQL
Server) - Documents and Tasks. It was decided to store every data inserted
into the DB in ElasticSearch.

I am beginning to learn ES and I'm wondering how many indices should I have

  • one for all data or one index for every DB entity (Documents index and
    Tasks index).

Thank you in advance,
Igor

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Cavanna) #2

Hi Igor,
the data design phase is really important. And it's mainly driven by how
you are going to use your data. For instance, are you going to get
documents by id? Or run full-text searches? Where are the fields that you
want to search on, in the documents or in the tasks? And how do you want to
get results back? Is there a relation between the two tables?

Those are some of the questions that you'd need to ask yourself to decide
how to structure your data. If you can add some details we'll try and help
you with this.

Cheers
Luca

On Tuesday, September 17, 2013 12:18:51 PM UTC+2, igo...@tomax.co.il wrote:

Let's suppose that I have two tables in my project's DB (today it's SQL
Server) - Documents and Tasks. It was decided to store every data inserted
into the DB in ElasticSearch.

I am beginning to learn ES and I'm wondering how many indices should I
have - one for all data or one index for every DB entity (Documents index
and Tasks index).

Thank you in advance,
Igor

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(igor.o) #3

Hi Luca,

Thank you very much for your answer.

  1. Are you going to get documents by id? "no" (today, but who knows what
    we'll want tomorrow?)
  2. Run full-text searches? Of course!
  3. Where are the fields that you want to search on, in the documents or in
    the tasks? (In both)
  4. How do you want to get results back? I'd prefer to get them as list of
    objects defined in our project.
  5. Is there a relation between the two tables? Sure. Every task has one or
    more documents.

Thank you in advance,
Igor

On Tuesday, September 17, 2013 1:47:18 PM UTC+3, Luca Cavanna wrote:

Hi Igor,
the data design phase is really important. And it's mainly driven by how
you are going to use your data. For instance, are you going to get
documents by id? Or run full-text searches? Where are the fields that you
want to search on, in the documents or in the tasks? And how do you want to
get results back? Is there a relation between the two tables?

Those are some of the questions that you'd need to ask yourself to decide
how to structure your data. If you can add some details we'll try and help
you with this.

Cheers
Luca

On Tuesday, September 17, 2013 12:18:51 PM UTC+2, igo...@tomax.co.ilwrote:

Let's suppose that I have two tables in my project's DB (today it's SQL
Server) - Documents and Tasks. It was decided to store every data inserted
into the DB in ElasticSearch.

I am beginning to learn ES and I'm wondering how many indices should I
have - one for all data or one index for every DB entity (Documents index
and Tasks index).

Thank you in advance,
Igor

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Cavanna) #4

If the two tables have a relation between them, then you can try and
represent it in elasticsearch or flatten your data. I would suggest you to
have a look at this
blogpost: http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/
. It basically explains what options you have and the difference between
them.

The reason why I asked how you want to get the results back, was becasue if
you for instance use nested documents, you can only retrieve the parent
document and its children, while with parent child the documents can have a
separate update lifecycle but you can't retrieve both parent and children
in the same call. Also, do consider the "manual" option, flattening your
data, which is usually the most performant one!

Cheers
Luca

On Tuesday, September 17, 2013 12:53:08 PM UTC+2, igo...@tomax.co.il wrote:

Hi Luca,

Thank you very much for your answer.

  1. Are you going to get documents by id? "no" (today, but who knows what
    we'll want tomorrow?)
  2. Run full-text searches? Of course!
  3. Where are the fields that you want to search on, in the documents or in
    the tasks? (In both)
  4. How do you want to get results back? I'd prefer to get them as list of
    objects defined in our project.
  5. Is there a relation between the two tables? Sure. Every task has one or
    more documents.

Thank you in advance,
Igor

On Tuesday, September 17, 2013 1:47:18 PM UTC+3, Luca Cavanna wrote:

Hi Igor,
the data design phase is really important. And it's mainly driven by how
you are going to use your data. For instance, are you going to get
documents by id? Or run full-text searches? Where are the fields that you
want to search on, in the documents or in the tasks? And how do you want to
get results back? Is there a relation between the two tables?

Those are some of the questions that you'd need to ask yourself to decide
how to structure your data. If you can add some details we'll try and help
you with this.

Cheers
Luca

On Tuesday, September 17, 2013 12:18:51 PM UTC+2, igo...@tomax.co.ilwrote:

Let's suppose that I have two tables in my project's DB (today it's SQL
Server) - Documents and Tasks. It was decided to store every data inserted
into the DB in ElasticSearch.

I am beginning to learn ES and I'm wondering how many indices should I
have - one for all data or one index for every DB entity (Documents index
and Tasks index).

Thank you in advance,
Igor

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5