Parent/Child and index allocation problem

Hi,

I wanted to ask a best practice for the following problem:

I have three types of documents:

  • User
  • IncomingLog
  • OutgoingLog

For my mapping I have setup the *User *to be the parent of the other two
documents *IncomingLog *and OutgoingLog. Users are my user base whereas
log documents are historical data. I do this to perform various
parent/child queries to fetch users that match specific criteria based on
their log history.

What I'm wondering is that after some time the logs will become huge
therefore I need to do some time flow of data by changing and index every
lets say month for log documents. What is the best possible way of doing
that? given that the Users are not historical data and i cannot change
index along with the log docs. I know that parent/child documents are
sharded in the same shard, can we say that we can have one index for the
users, without creating a new one each month but have an other index for
the logs and creating a new one each month and combining them with an
alias?? My users are way to many (I think) around 10 milliion or even more,
to create one index per user, and it does not seems obvious to me if this
is a good thing to do.

What do you suggest??

Thank you in advance.
Thomas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Thomas,

I think you can keep all the data in a single index, and use the TTL
fieldhttp://www.elasticsearch.org/guide/reference/mapping/ttl-field/to
get rid of the logs after a while.

Sure, from a performance perspective, it would be nice to have time-based
indices for logs. Because searching in recent data and deleting will be
more efficient. But you have to have users in the same index as the logs to
be able to make the parent-child relationship. You can still make that
happen, but I don't see a straightforward way.

For example, you can have a master index with all your users, and
time-based indices with your logs. Whenever you index a new log, you can
make sure the user it belongs to is indexed in the same index as well. Or,
you can duplicate all your users in each time-based index.

Best regards,
Radu

On Thu, Jul 25, 2013 at 1:11 PM, Thomas thomas.bolis@gmail.com wrote:

Hi,

I wanted to ask a best practice for the following problem:

I have three types of documents:

  • User
  • IncomingLog
  • OutgoingLog

For my mapping I have setup the *User *to be the parent of the other two
documents *IncomingLog *and OutgoingLog. Users are my user base whereas
log documents are historical data. I do this to perform various
parent/child queries to fetch users that match specific criteria based on
their log history.

What I'm wondering is that after some time the logs will become huge
therefore I need to do some time flow of data by changing and index every
lets say month for log documents. What is the best possible way of doing
that? given that the Users are not historical data and i cannot change
index along with the log docs. I know that parent/child documents are
sharded in the same shard, can we say that we can have one index for the
users, without creating a new one each month but have an other index for
the logs and creating a new one each month and combining them with an
alias?? My users are way to many (I think) around 10 milliion or even more,
to create one index per user, and it does not seems obvious to me if this
is a good thing to do.

What do you suggest??

Thank you in advance.
Thomas

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dear Radu,

Thanks for your reply, unfortunately this does not solve my case. What i
was thinking is split my users in groups in way where a user will belong
strictly to a group so I can have some short of a load balancing of users
along with their logs to a set of identical indexes and search across all
indexes with an alias. But again this will partially solve my problem as
the number of indexes will be fixed.

Does this makes sense?

T.

On Thu, Jul 25, 2013 at 1:39 PM, Radu Gheorghe
radu.gheorghe@sematext.comwrote:

Hello Thomas,

I think you can keep all the data in a single index, and use the TTL fieldhttp://www.elasticsearch.org/guide/reference/mapping/ttl-field/to get rid of the logs after a while.

Sure, from a performance perspective, it would be nice to have time-based
indices for logs. Because searching in recent data and deleting will be
more efficient. But you have to have users in the same index as the logs to
be able to make the parent-child relationship. You can still make that
happen, but I don't see a straightforward way.

For example, you can have a master index with all your users, and
time-based indices with your logs. Whenever you index a new log, you can
make sure the user it belongs to is indexed in the same index as well. Or,
you can duplicate all your users in each time-based index.

Best regards,
Radu

On Thu, Jul 25, 2013 at 1:11 PM, Thomas thomas.bolis@gmail.com wrote:

Hi,

I wanted to ask a best practice for the following problem:

I have three types of documents:

  • User
  • IncomingLog
  • OutgoingLog

For my mapping I have setup the *User *to be the parent of the other two
documents *IncomingLog *and OutgoingLog. Users are my user base
whereas log documents are historical data. I do this to perform various
parent/child queries to fetch users that match specific criteria based on
their log history.

What I'm wondering is that after some time the logs will become huge
therefore I need to do some time flow of data by changing and index every
lets say month for log documents. What is the best possible way of doing
that? given that the Users are not historical data and i cannot change
index along with the log docs. I know that parent/child documents are
sharded in the same shard, can we say that we can have one index for the
users, without creating a new one each month but have an other index for
the logs and creating a new one each month and combining them with an
alias?? My users are way to many (I think) around 10 milliion or even more,
to create one index per user, and it does not seems obvious to me if this
is a good thing to do.

What do you suggest??

Thank you in advance.

Thomas

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/wjyE9Nu2vjI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.