Hello,
With each type, ES has to store a mapping, so it has its overhead compared
to a simple field in your docs. So if your logs have a fixed structure,
like plain syslog or apache logs, you're probably better off if you just
add a field to every doc indicating the user. Then when searching, you can
filter by the user name.
If, on top of that, customers send similar amounts of data, you could use
the user name as the routing field, which will improve query performance:
http://www.elasticsearch.org/guide/reference/mapping/routing-field.html
Another option is to use separate indices for separate customers. But then,
if you have an index per day, you'll end up with a lot of indices. That
said, you'd have to do this separation if you'll also support structured
logging (eg: let the user send you a JSON of her choice). Otherwise, if the
same field will be integer in a type and string in another type, you'll get
issues.
Best regards,
Radu
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene
On Thu, Nov 29, 2012 at 11:41 AM, Wojons Tech wojonstech@gmail.com wrote:
Well i am not sure if there would be an advatage to having each cusomter
information into one type I am very new to elasticsearch this will be the
first application I am writing in it, I am also writing another php client
for it.
The way i see it having the clients data in the type would mean something
like this.
http://127.0.0.1:9200/today/alexis-apache/_search?q=userIp:192.168.2.1&route=app1
this way this query makes it easy to look at all the events that took
place on that day, for that users apache logs, searching for all the events
with the selected ip address that hit that app server. Now i can modify the
route to get a different or more of there servers, i can edit the ip that i
am looking for easily or change the log or i can change the index from
today to being the index i have for a week of logs. I think it makes the
work from my end easy but i am not sure if it has a postive or negitive nor
no effect on the cluster.
On Wednesday, November 28, 2012 9:57:21 AM UTC-8, Michael Kleen wrote:
Hi Wojan,
I would give search documents different types when they are semantically
different. Put your customer information in a separate field. What would be
the advantages from your perspective to also put customer information in
the _type field ?
Michael
On Tuesday, November 27, 2012 1:32:30 PM UTC+1, Wojons Tech wrote:
Michael,
Thank you for your response I am creating a system for managing logs.
The customer will be able to select which logs on the inferstrutre they
want to monitor and store on my platform, they may just use there syslog or
they may use Apache error logs and all sorts of other things of there
choice. I was thinking that I would break the logs my indexes to daily or
weekly, and then I would have a _type for each log type by customer, and
then i will use a route to limit by the server sending that log type. This
means if yourself and I both have apache logs we will use different _types.
I was also planning on using alias to group multipal indexes together so
the last week or month and then all indexes, close an index. I would also
have 1 index or more that is just for summery data, the issue is i want to
make sure i have enough shards for the summery data.
Thanks,
Alexis
On Tuesday, November 27, 2012 1:07:21 AM UTC-8, Michael Kleen wrote:
Hello,
The _type field is a normal string field in your index which is not
analyzed. You want to use the _type field to store the name of your
document to index. When you execute a query restricted by a certain type,
then a term filter is used with the type name against the _type field to
limit your results. How many types are we speaking of, when you mean "lots"
?
Bests,
Michael
On Tuesday, November 27, 2012 12:15:40 AM UTC+1, Wojons Tech wrote:
I am trying understand how _type plays a roll in index performance and
how much a _type cost in the system. I am building a Saas application,
where i am going to be using time based indexes, and will be storing
different types of user information, now what i am unaware of is if i am
storing log information and have different types of logs, is it best to
store the log type in the data, or as the _type. I am not sure if _type is
just a special prefix but in the end part of the same master index, Also is
there a big price having lots and lots of _types?
--
--