Best solution for keeping data of many clients


(Marcin Dojwa) #1

Hi,

I have the following problem. I have to design system based on ES keeping
the of many clients with many documents. Here are the details.

  1. It must keep data of up to 1mln clients
  2. I have 3 types of data (eg. chats, login_data, left_messages)
  3. Each client has up to 10mln documents of each type.

My current design looks like this:
/<client_number>/<document_type>/<document_id> (eg. /1231/chats/XSAuxaS)
It gives me up to 1mln indices created. Currently with 640 indices I have
problem with "too many open files" while I have set it up to 65535. I
believe that this is not the right way to design this right ? :slight_smile:

So I have 2 solutions:

  1. Design it like this: /<document_type>/<client_number>/<document_id> -
    this gives me 3 indices but up to 1mln document types. Would this still
    cause "too many open files" problem or any other problems ?
  2. Put all clients' data to one document type. Eg. put all clients' chats
    to /anything/chats/<document_id> but this gives me really huge number of
    documents in a single index and document type.

Important thing is that I always query for a specified <client_number> and
<document_type>, so I do not need it in a common bucket.

So, what is the best solution using ES in this case ? :slight_smile: If you need more
information let me know. Thanks for any help.

Best regards.
Marcin Dojwa.


(Lukáš Vlček) #2

Hi,

have you seen Shay's talk [1] where he explains index aliasing [2] with
routing and filter feature?

[1]
http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html(sounds
like "users" data flow [starting at slide 20/44] is what you are
looking for?)
[2]
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Regards,
Lukas

On Wed, Aug 8, 2012 at 4:03 PM, Marcin Dojwa m.dojwa@livechatinc.comwrote:

Hi,

I have the following problem. I have to design system based on ES keeping
the of many clients with many documents. Here are the details.

  1. It must keep data of up to 1mln clients
  2. I have 3 types of data (eg. chats, login_data, left_messages)
  3. Each client has up to 10mln documents of each type.

My current design looks like this:
/<client_number>/<document_type>/<document_id> (eg. /1231/chats/XSAuxaS)
It gives me up to 1mln indices created. Currently with 640 indices I have
problem with "too many open files" while I have set it up to 65535. I
believe that this is not the right way to design this right ? :slight_smile:

So I have 2 solutions:

  1. Design it like this: /<document_type>/<client_number>/<document_id> -
    this gives me 3 indices but up to 1mln document types. Would this still
    cause "too many open files" problem or any other problems ?
  2. Put all clients' data to one document type. Eg. put all clients' chats
    to /anything/chats/<document_id> but this gives me really huge number of
    documents in a single index and document type.

Important thing is that I always query for a specified <client_number> and
<document_type>, so I do not need it in a common bucket.

So, what is the best solution using ES in this case ? :slight_smile: If you need more
information let me know. Thanks for any help.

Best regards.
Marcin Dojwa.


(Marcin Dojwa) #3

Thank you, I will check this out.

Best regards.

2012/8/8 Lukáš Vlček lukas.vlcek@gmail.com

Hi,

have you seen Shay's talk [1] where he explains index aliasing [2] with
routing and filter feature?

[1]
http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html(sounds like "users" data flow [starting at slide 20/44] is what you are
looking for?)
[2]
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Regards,
Lukas

On Wed, Aug 8, 2012 at 4:03 PM, Marcin Dojwa m.dojwa@livechatinc.comwrote:

Hi,

I have the following problem. I have to design system based on ES keeping
the of many clients with many documents. Here are the details.

  1. It must keep data of up to 1mln clients
  2. I have 3 types of data (eg. chats, login_data, left_messages)
  3. Each client has up to 10mln documents of each type.

My current design looks like this:
/<client_number>/<document_type>/<document_id> (eg. /1231/chats/XSAuxaS)
It gives me up to 1mln indices created. Currently with 640 indices I have
problem with "too many open files" while I have set it up to 65535. I
believe that this is not the right way to design this right ? :slight_smile:

So I have 2 solutions:

  1. Design it like this: /<document_type>/<client_number>/<document_id> -
    this gives me 3 indices but up to 1mln document types. Would this still
    cause "too many open files" problem or any other problems ?
  2. Put all clients' data to one document type. Eg. put all clients' chats
    to /anything/chats/<document_id> but this gives me really huge number of
    documents in a single index and document type.

Important thing is that I always query for a specified <client_number>
and <document_type>, so I do not need it in a common bucket.

So, what is the best solution using ES in this case ? :slight_smile: If you need more
information let me know. Thanks for any help.

Best regards.
Marcin Dojwa.


(Marcin Dojwa) #4

Thank you very much, I think this solves all my problems :slight_smile:

Best regards.

2012/8/8 Marcin Dojwa m.dojwa@livechatinc.com

Thank you, I will check this out.

Best regards.

2012/8/8 Lukáš Vlček lukas.vlcek@gmail.com

Hi,

have you seen Shay's talk [1] where he explains index aliasing [2] with
routing and filter feature?

[1]
http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html(sounds like "users" data flow [starting at slide 20/44] is what you are
looking for?)
[2]
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

Regards,
Lukas

On Wed, Aug 8, 2012 at 4:03 PM, Marcin Dojwa m.dojwa@livechatinc.comwrote:

Hi,

I have the following problem. I have to design system based on ES
keeping the of many clients with many documents. Here are the details.

  1. It must keep data of up to 1mln clients
  2. I have 3 types of data (eg. chats, login_data, left_messages)
  3. Each client has up to 10mln documents of each type.

My current design looks like this:
/<client_number>/<document_type>/<document_id> (eg. /1231/chats/XSAuxaS)
It gives me up to 1mln indices created. Currently with 640 indices I
have problem with "too many open files" while I have set it up to 65535. I
believe that this is not the right way to design this right ? :slight_smile:

So I have 2 solutions:

  1. Design it like this: /<document_type>/<client_number>/<document_id> -
    this gives me 3 indices but up to 1mln document types. Would this still
    cause "too many open files" problem or any other problems ?
  2. Put all clients' data to one document type. Eg. put all clients'
    chats to /anything/chats/<document_id> but this gives me really huge number
    of documents in a single index and document type.

Important thing is that I always query for a specified <client_number>
and <document_type>, so I do not need it in a common bucket.

So, what is the best solution using ES in this case ? :slight_smile: If you need
more information let me know. Thanks for any help.

Best regards.
Marcin Dojwa.


(system) #5