I am trying understand how _type plays a roll in index performance and how
much a _type cost in the system. I am building a Saas application, where i
am going to be using time based indexes, and will be storing different
types of user information, now what i am unaware of is if i am storing log
information and have different types of logs, is it best to store the log
type in the data, or as the _type. I am not sure if _type is just a special
prefix but in the end part of the same master index, Also is there a big
price having lots and lots of _types?
The _type field is a normal string field in your index which is not
analyzed. You want to use the _type field to store the name of your
document to index. When you execute a query restricted by a certain type,
then a term filter is used with the type name against the _type field to
limit your results. How many types are we speaking of, when you mean "lots"
?
Bests,
Michael
On Tuesday, November 27, 2012 12:15:40 AM UTC+1, Wojons Tech wrote:
I am trying understand how _type plays a roll in index performance and how
much a _type cost in the system. I am building a Saas application, where i
am going to be using time based indexes, and will be storing different
types of user information, now what i am unaware of is if i am storing log
information and have different types of logs, is it best to store the log
type in the data, or as the _type. I am not sure if _type is just a special
prefix but in the end part of the same master index, Also is there a big
price having lots and lots of _types?
Thank you for your response I am creating a system for managing logs. The
customer will be able to select which logs on the inferstrutre they want to
monitor and store on my platform, they may just use there syslog or they
may use Apache error logs and all sorts of other things of there choice. I
was thinking that I would break the logs my indexes to daily or weekly, and
then I would have a _type for each log type by customer, and then i will
use a route to limit by the server sending that log type. This means if
yourself and I both have apache logs we will use different _types. I was
also planning on using alias to group multipal indexes together so the last
week or month and then all indexes, close an index. I would also have 1
index or more that is just for summery data, the issue is i want to make
sure i have enough shards for the summery data.
Thanks,
Alexis
On Tuesday, November 27, 2012 1:07:21 AM UTC-8, Michael Kleen wrote:
Hello,
The _type field is a normal string field in your index which is not
analyzed. You want to use the _type field to store the name of your
document to index. When you execute a query restricted by a certain type,
then a term filter is used with the type name against the _type field to
limit your results. How many types are we speaking of, when you mean "lots"
?
Bests,
Michael
On Tuesday, November 27, 2012 12:15:40 AM UTC+1, Wojons Tech wrote:
I am trying understand how _type plays a roll in index performance and
how much a _type cost in the system. I am building a Saas application,
where i am going to be using time based indexes, and will be storing
different types of user information, now what i am unaware of is if i am
storing log information and have different types of logs, is it best to
store the log type in the data, or as the _type. I am not sure if _type is
just a special prefix but in the end part of the same master index, Also is
there a big price having lots and lots of _types?
On Tuesday, November 27, 2012 1:07:21 AM UTC-8, Michael Kleen wrote:
Hello,
The _type field is a normal string field in your index which is not
analyzed. You want to use the _type field to store the name of your
document to index. When you execute a query restricted by a certain type,
then a term filter is used with the type name against the _type field to
limit your results. How many types are we speaking of, when you mean "lots"
?
Bests,
Michael
On Tuesday, November 27, 2012 12:15:40 AM UTC+1, Wojons Tech wrote:
I am trying understand how _type plays a roll in index performance and
how much a _type cost in the system. I am building a Saas application,
where i am going to be using time based indexes, and will be storing
different types of user information, now what i am unaware of is if i am
storing log information and have different types of logs, is it best to
store the log type in the data, or as the _type. I am not sure if _type is
just a special prefix but in the end part of the same master index, Also is
there a big price having lots and lots of _types?
I would give search documents different types when they are semantically
different. Put your customer information in a separate field. What would be
the advantages from your perspective to also put customer information in
the _type field ?
Michael
On Tuesday, November 27, 2012 1:32:30 PM UTC+1, Wojons Tech wrote:
Michael,
Thank you for your response I am creating a system for managing logs. The
customer will be able to select which logs on the inferstrutre they want to
monitor and store on my platform, they may just use there syslog or they
may use Apache error logs and all sorts of other things of there choice. I
was thinking that I would break the logs my indexes to daily or weekly, and
then I would have a _type for each log type by customer, and then i will
use a route to limit by the server sending that log type. This means if
yourself and I both have apache logs we will use different _types. I was
also planning on using alias to group multipal indexes together so the last
week or month and then all indexes, close an index. I would also have 1
index or more that is just for summery data, the issue is i want to make
sure i have enough shards for the summery data.
Thanks,
Alexis
On Tuesday, November 27, 2012 1:07:21 AM UTC-8, Michael Kleen wrote:
Hello,
The _type field is a normal string field in your index which is not
analyzed. You want to use the _type field to store the name of your
document to index. When you execute a query restricted by a certain type,
then a term filter is used with the type name against the _type field to
limit your results. How many types are we speaking of, when you mean "lots"
?
Bests,
Michael
On Tuesday, November 27, 2012 12:15:40 AM UTC+1, Wojons Tech wrote:
I am trying understand how _type plays a roll in index performance and
how much a _type cost in the system. I am building a Saas application,
where i am going to be using time based indexes, and will be storing
different types of user information, now what i am unaware of is if i am
storing log information and have different types of logs, is it best to
store the log type in the data, or as the _type. I am not sure if _type is
just a special prefix but in the end part of the same master index, Also is
there a big price having lots and lots of _types?
Well i am not sure if there would be an advatage to having each cusomter
information into one type I am very new to elasticsearch this will be the
first application I am writing in it, I am also writing another php client
for it.
The way i see it having the clients data in the type would mean something
like this.
this way this query makes it easy to look at all the events that took place
on that day, for that users apache logs, searching for all the events with
the selected ip address that hit that app server. Now i can modify the
route to get a different or more of there servers, i can edit the ip that i
am looking for easily or change the log or i can change the index from
today to being the index i have for a week of logs. I think it makes the
work from my end easy but i am not sure if it has a postive or negitive nor
no effect on the cluster.
On Wednesday, November 28, 2012 9:57:21 AM UTC-8, Michael Kleen wrote:
Hi Wojan,
I would give search documents different types when they are semantically
different. Put your customer information in a separate field. What would be
the advantages from your perspective to also put customer information in
the _type field ?
Michael
On Tuesday, November 27, 2012 1:32:30 PM UTC+1, Wojons Tech wrote:
Michael,
Thank you for your response I am creating a system for managing logs.
The customer will be able to select which logs on the inferstrutre they
want to monitor and store on my platform, they may just use there syslog or
they may use Apache error logs and all sorts of other things of there
choice. I was thinking that I would break the logs my indexes to daily or
weekly, and then I would have a _type for each log type by customer, and
then i will use a route to limit by the server sending that log type. This
means if yourself and I both have apache logs we will use different _types.
I was also planning on using alias to group multipal indexes together so
the last week or month and then all indexes, close an index. I would also
have 1 index or more that is just for summery data, the issue is i want to
make sure i have enough shards for the summery data.
Thanks,
Alexis
On Tuesday, November 27, 2012 1:07:21 AM UTC-8, Michael Kleen wrote:
Hello,
The _type field is a normal string field in your index which is not
analyzed. You want to use the _type field to store the name of your
document to index. When you execute a query restricted by a certain type,
then a term filter is used with the type name against the _type field to
limit your results. How many types are we speaking of, when you mean "lots"
?
Bests,
Michael
On Tuesday, November 27, 2012 12:15:40 AM UTC+1, Wojons Tech wrote:
I am trying understand how _type plays a roll in index performance and
how much a _type cost in the system. I am building a Saas application,
where i am going to be using time based indexes, and will be storing
different types of user information, now what i am unaware of is if i am
storing log information and have different types of logs, is it best to
store the log type in the data, or as the _type. I am not sure if _type is
just a special prefix but in the end part of the same master index, Also is
there a big price having lots and lots of _types?
With each type, ES has to store a mapping, so it has its overhead compared
to a simple field in your docs. So if your logs have a fixed structure,
like plain syslog or apache logs, you're probably better off if you just
add a field to every doc indicating the user. Then when searching, you can
filter by the user name.
If, on top of that, customers send similar amounts of data, you could use
the user name as the routing field, which will improve query performance:
Another option is to use separate indices for separate customers. But then,
if you have an index per day, you'll end up with a lot of indices. That
said, you'd have to do this separation if you'll also support structured
logging (eg: let the user send you a JSON of her choice). Otherwise, if the
same field will be integer in a type and string in another type, you'll get
issues.
Well i am not sure if there would be an advatage to having each cusomter
information into one type I am very new to elasticsearch this will be the
first application I am writing in it, I am also writing another php client
for it.
The way i see it having the clients data in the type would mean something
like this.
this way this query makes it easy to look at all the events that took
place on that day, for that users apache logs, searching for all the events
with the selected ip address that hit that app server. Now i can modify the
route to get a different or more of there servers, i can edit the ip that i
am looking for easily or change the log or i can change the index from
today to being the index i have for a week of logs. I think it makes the
work from my end easy but i am not sure if it has a postive or negitive nor
no effect on the cluster.
On Wednesday, November 28, 2012 9:57:21 AM UTC-8, Michael Kleen wrote:
Hi Wojan,
I would give search documents different types when they are semantically
different. Put your customer information in a separate field. What would be
the advantages from your perspective to also put customer information in
the _type field ?
Michael
On Tuesday, November 27, 2012 1:32:30 PM UTC+1, Wojons Tech wrote:
Michael,
Thank you for your response I am creating a system for managing logs.
The customer will be able to select which logs on the inferstrutre they
want to monitor and store on my platform, they may just use there syslog or
they may use Apache error logs and all sorts of other things of there
choice. I was thinking that I would break the logs my indexes to daily or
weekly, and then I would have a _type for each log type by customer, and
then i will use a route to limit by the server sending that log type. This
means if yourself and I both have apache logs we will use different _types.
I was also planning on using alias to group multipal indexes together so
the last week or month and then all indexes, close an index. I would also
have 1 index or more that is just for summery data, the issue is i want to
make sure i have enough shards for the summery data.
Thanks,
Alexis
On Tuesday, November 27, 2012 1:07:21 AM UTC-8, Michael Kleen wrote:
Hello,
The _type field is a normal string field in your index which is not
analyzed. You want to use the _type field to store the name of your
document to index. When you execute a query restricted by a certain type,
then a term filter is used with the type name against the _type field to
limit your results. How many types are we speaking of, when you mean "lots"
?
Bests,
Michael
On Tuesday, November 27, 2012 12:15:40 AM UTC+1, Wojons Tech wrote:
I am trying understand how _type plays a roll in index performance and
how much a _type cost in the system. I am building a Saas application,
where i am going to be using time based indexes, and will be storing
different types of user information, now what i am unaware of is if i am
storing log information and have different types of logs, is it best to
store the log type in the data, or as the _type. I am not sure if _type is
just a special prefix but in the end part of the same master index, Also is
there a big price having lots and lots of _types?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.