Hi,
I have the following problem. I have to design system based on ES keeping
the of many clients with many documents. Here are the details.
- It must keep data of up to 1mln clients
- I have 3 types of data (eg. chats, login_data, left_messages)
- Each client has up to 10mln documents of each type.
My current design looks like this:
/<client_number>/<document_type>/<document_id> (eg. /1231/chats/XSAuxaS)
It gives me up to 1mln indices created. Currently with 640 indices I have
problem with "too many open files" while I have set it up to 65535. I
believe that this is not the right way to design this right ?
So I have 2 solutions:
- Design it like this: /<document_type>/<client_number>/<document_id> -
this gives me 3 indices but up to 1mln document types. Would this still
cause "too many open files" problem or any other problems ?
- Put all clients' data to one document type. Eg. put all clients' chats
to /anything/chats/<document_id> but this gives me really huge number of
documents in a single index and document type.
Important thing is that I always query for a specified <client_number> and
<document_type>, so I do not need it in a common bucket.
So, what is the best solution using ES in this case ? If you need more
information let me know. Thanks for any help.
Best regards.
Marcin Dojwa.
Hi,
have you seen Shay's talk [1] where he explains index aliasing [2] with
routing and filter feature?
[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic(sounds
like "users" data flow [starting at slide 20/44] is what you are
looking for?)
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic
Regards,
Lukas
On Wed, Aug 8, 2012 at 4:03 PM, Marcin Dojwa m.dojwa@livechatinc.comwrote:
Hi,
I have the following problem. I have to design system based on ES keeping
the of many clients with many documents. Here are the details.
- It must keep data of up to 1mln clients
- I have 3 types of data (eg. chats, login_data, left_messages)
- Each client has up to 10mln documents of each type.
My current design looks like this:
/<client_number>/<document_type>/<document_id> (eg. /1231/chats/XSAuxaS)
It gives me up to 1mln indices created. Currently with 640 indices I have
problem with "too many open files" while I have set it up to 65535. I
believe that this is not the right way to design this right ?
So I have 2 solutions:
- Design it like this: /<document_type>/<client_number>/<document_id> -
this gives me 3 indices but up to 1mln document types. Would this still
cause "too many open files" problem or any other problems ?
- Put all clients' data to one document type. Eg. put all clients' chats
to /anything/chats/<document_id> but this gives me really huge number of
documents in a single index and document type.
Important thing is that I always query for a specified <client_number> and
<document_type>, so I do not need it in a common bucket.
So, what is the best solution using ES in this case ? If you need more
information let me know. Thanks for any help.
Best regards.
Marcin Dojwa.
Thank you, I will check this out.
Best regards.
2012/8/8 Lukáš Vlček lukas.vlcek@gmail.com
Hi,
have you seen Shay's talk [1] where he explains index aliasing [2] with
routing and filter feature?
[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic(sounds like "users" data flow [starting at slide 20/44] is what you are
looking for?)
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic
Regards,
Lukas
On Wed, Aug 8, 2012 at 4:03 PM, Marcin Dojwa m.dojwa@livechatinc.comwrote:
Hi,
I have the following problem. I have to design system based on ES keeping
the of many clients with many documents. Here are the details.
- It must keep data of up to 1mln clients
- I have 3 types of data (eg. chats, login_data, left_messages)
- Each client has up to 10mln documents of each type.
My current design looks like this:
/<client_number>/<document_type>/<document_id> (eg. /1231/chats/XSAuxaS)
It gives me up to 1mln indices created. Currently with 640 indices I have
problem with "too many open files" while I have set it up to 65535. I
believe that this is not the right way to design this right ?
So I have 2 solutions:
- Design it like this: /<document_type>/<client_number>/<document_id> -
this gives me 3 indices but up to 1mln document types. Would this still
cause "too many open files" problem or any other problems ?
- Put all clients' data to one document type. Eg. put all clients' chats
to /anything/chats/<document_id> but this gives me really huge number of
documents in a single index and document type.
Important thing is that I always query for a specified <client_number>
and <document_type>, so I do not need it in a common bucket.
So, what is the best solution using ES in this case ? If you need more
information let me know. Thanks for any help.
Best regards.
Marcin Dojwa.
Thank you very much, I think this solves all my problems
Best regards.
2012/8/8 Marcin Dojwa m.dojwa@livechatinc.com
Thank you, I will check this out.
Best regards.
2012/8/8 Lukáš Vlček lukas.vlcek@gmail.com
Hi,
have you seen Shay's talk [1] where he explains index aliasing [2] with
routing and filter feature?
[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic(sounds like "users" data flow [starting at slide 20/44] is what you are
looking for?)
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic
Regards,
Lukas
On Wed, Aug 8, 2012 at 4:03 PM, Marcin Dojwa m.dojwa@livechatinc.comwrote:
Hi,
I have the following problem. I have to design system based on ES
keeping the of many clients with many documents. Here are the details.
- It must keep data of up to 1mln clients
- I have 3 types of data (eg. chats, login_data, left_messages)
- Each client has up to 10mln documents of each type.
My current design looks like this:
/<client_number>/<document_type>/<document_id> (eg. /1231/chats/XSAuxaS)
It gives me up to 1mln indices created. Currently with 640 indices I
have problem with "too many open files" while I have set it up to 65535. I
believe that this is not the right way to design this right ?
So I have 2 solutions:
- Design it like this: /<document_type>/<client_number>/<document_id> -
this gives me 3 indices but up to 1mln document types. Would this still
cause "too many open files" problem or any other problems ?
- Put all clients' data to one document type. Eg. put all clients'
chats to /anything/chats/<document_id> but this gives me really huge number
of documents in a single index and document type.
Important thing is that I always query for a specified <client_number>
and <document_type>, so I do not need it in a common bucket.
So, what is the best solution using ES in this case ? If you need
more information let me know. Thanks for any help.
Best regards.
Marcin Dojwa.