Flume message indexing and search on attributes

We started playing around with Elastic Search for logs collected via
Flume from our servers. ES is awesome and we plan to go productive
with the solution wherein Flume is directed to pump input to ES.

Question: we want the ability to search via ES on Flume event
attributes like priority, custom meta tags and more. Additionally we
want to go ascending or descending based on time stamps or the nano
seconds that flume provides.

Given that Flume dumps Json we believe it should be good for ES. In
order to achieve search on attributes and ordering, do we need any
special mapping ? Can some one on the list please enlighten on how to
setup ES config so we can achieve the above.

Thanks in advance.

Cheers
Srini

Hi Srini

Question: we want the ability to search via ES on Flume event
attributes like priority, custom meta tags and more. Additionally we
want to go ascending or descending based on time stamps or the nano
seconds that flume provides.

Given that Flume dumps Json we believe it should be good for ES. In
order to achieve search on attributes and ordering, do we need any
special mapping ? Can some one on the list please enlighten on how to
setup ES config so we can achieve the above.

ES does its best to guess what type of data each field contains (the
first time it sees the new field), eg for a doc with:

{
"title": "Foo",
"count": 5,
"live": true,
"date": "2011-12-03 12:00:00"
}

...ES would correctly identify:

  • title: full text string
  • count: long
  • live: boolean
  • date: datetime

However, with:
{
"count": "10",
"status": "ACTIVE",
"tags": ["foo","bar-baz"]
}

... it would identify all of these as full-text strings, which probably
isn't what you want.

"count" should be a number, "status" and "tags" should be type "string",
but with {"index": "not_analyzed"} so that you can search for the exact
term "STATUS" and not have it match "Status", and searching for "bar"
shouldn't match "foo-bar".

To avoid these errors, you should predefine your mappings. ES makes it
easy to try things out by just inserting docs. You can use the 'get
mapping' API to see how ES has mapped each field.

You can use this mapping info to build your own correct mapping, which
you can specify when you create the index.

clint

Hi Clint

Thanks a lot for that. I will check this out and mail back in case I need further hints.

Cheers
Srini
Sent from my BlackBerry® smartphone

-----Original Message-----
From: Clinton Gormley clint@traveljury.com
Sender: elasticsearch@googlegroups.com
Date: Sat, 03 Dec 2011 11:10:31
To: elasticsearch@googlegroups.com
Reply-To: elasticsearch@googlegroups.com
Subject: Re: Flume message indexing and search on attributes

Hi Srini

Question: we want the ability to search via ES on Flume event
attributes like priority, custom meta tags and more. Additionally we
want to go ascending or descending based on time stamps or the nano
seconds that flume provides.

Given that Flume dumps Json we believe it should be good for ES. In
order to achieve search on attributes and ordering, do we need any
special mapping ? Can some one on the list please enlighten on how to
setup ES config so we can achieve the above.

ES does its best to guess what type of data each field contains (the
first time it sees the new field), eg for a doc with:

{
"title": "Foo",
"count": 5,
"live": true,
"date": "2011-12-03 12:00:00"
}

...ES would correctly identify:

  • title: full text string
  • count: long
  • live: boolean
  • date: datetime

However, with:
{
"count": "10",
"status": "ACTIVE",
"tags": ["foo","bar-baz"]
}

... it would identify all of these as full-text strings, which probably
isn't what you want.

"count" should be a number, "status" and "tags" should be type "string",
but with {"index": "not_analyzed"} so that you can search for the exact
term "STATUS" and not have it match "Status", and searching for "bar"
shouldn't match "foo-bar".

To avoid these errors, you should predefine your mappings. ES makes it
easy to try things out by just inserting docs. You can use the 'get
mapping' API to see how ES has mapped each field.

You can use this mapping info to build your own correct mapping, which
you can specify when you create the index.

clint