How to seperate indexed data by customerId, can indexes expire?

gitted · April 23, 2013, 1:33pm

I want to store http logs for 100's of websites into ES, but when a user
searches, they will ONLY search for logs for a particular website,
not across all websites.

#1 How should I go about designing how this is stored in ES?

#2 I is there any way to expire the indexed data by a specific date? Is
this something built in or I'll have to manually remove them?

e.g. some website log data should expire (removed from the index) after 1
month, while others should expire in 6 months => it varies per website
potentially.

#3 When indexing data into ES, say I index an article. So when someone
performs a search, they will get some search results and say they click on
a link that relates to article#123, do I have to store article#123's
content somewhere else to display the ENTIRE article or can I also somehow
get this from ES?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shadyabhi · April 23, 2013, 6:02pm

You should have a look at logstash project. Pls see other answer to
your questions inline. My answer will make a lot more sense if you
read a bit about logstash though.

On Tue, Apr 23, 2013 at 7:03 PM, gitted sahmed1020@gmail.com wrote:

I want to store http logs for 100's of websites into ES, but when a user
searches, they will ONLY search for logs for a particular website, not
across all websites.

#1 How should I go about designing how this is stored in ES?

As you already know that you'll search for a website's log at a time,
you might want to use "tags" in ES output to send each website's data
to a separate index and set the index string in ES output block
appropriately (so that it creates separate index for each website).

#2 I is there any way to expire the indexed data by a specific date? Is
this something built in or I'll have to manually remove them?

There is TTL (time to live) for documents but there is nothing like
that for indices.

e.g. some website log data should expire (removed from the index) after 1
month, while others should expire in 6 months => it varies per website
potentially.

The way it's done in logstash is, each day a new index is created
(generally speaking, it's all configurable).. So, you could do
something like creating new index everyday and your index name will
contain the date on which it was created. Then, you can use that to
delete relevant indices as per your policy.

#3 When indexing data into ES, say I index an article. So when someone
performs a search, they will get some search results and say they click on a
link that relates to article#123, do I have to store article#123's content
somewhere else to display the ENTIRE article or can I also somehow get this
from ES?

I'm not entirely sure what is being asked here. If the article came in
search results that means it's there is ES, right? Each document in an
index has a unique _id value, if that helps.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
http://blog.abhijeetr.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

gitted · April 23, 2013, 6:20pm

Thanks for your replies.

I'm not entirely sure what is being asked here. If the article came in
search results that means it's there is ES, right? Each document in an
index has a unique _id value, if that helps.
From what I assumed, ES breaks each document up into tokens, then when you
search, it finds a match and each search result would be associated with a
documentId, but for some reason I thought I would have to then pull that
document from my database (mysql) because ES would not store the document
except in its index, which is all tokenized etc.

On Tuesday, April 23, 2013 2:02:39 PM UTC-4, Abhijeet Rastogi wrote:

You should have a look at logstash project. Pls see other answer to
your questions inline. My answer will make a lot more sense if you
read a bit about logstash though.

On Tue, Apr 23, 2013 at 7:03 PM, gitted <sahme...@gmail.com <javascript:>>
wrote:

I want to store http logs for 100's of websites into ES, but when a user
searches, they will ONLY search for logs for a particular website, not
across all websites.

#1 How should I go about designing how this is stored in ES?

As you already know that you'll search for a website's log at a time,
you might want to use "tags" in ES output to send each website's data
to a separate index and set the index string in ES output block
appropriately (so that it creates separate index for each website).

#2 I is there any way to expire the indexed data by a specific date? Is
this something built in or I'll have to manually remove them?

There is TTL (time to live) for documents but there is nothing like
that for indices.

e.g. some website log data should expire (removed from the index) after
1
month, while others should expire in 6 months => it varies per website
potentially.

The way it's done in logstash is, each day a new index is created
(generally speaking, it's all configurable).. So, you could do
something like creating new index everyday and your index name will
contain the date on which it was created. Then, you can use that to
delete relevant indices as per your policy.

#3 When indexing data into ES, say I index an article. So when someone
performs a search, they will get some search results and say they click
on a
link that relates to article#123, do I have to store article#123's
content
somewhere else to display the ENTIRE article or can I also somehow get
this
from ES?

I'm not entirely sure what is being asked here. If the article came in
search results that means it's there is ES, right? Each document in an
index has a unique _id value, if that helps.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
http://blog.abhijeetr.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Data storage and purging in ES Elasticsearch	1	341	July 6, 2017
Making logs expire relative to the timestamp and not the indexing time(system time) Elasticsearch	1	324	July 6, 2017
Can i "Filter" certain logs into a Special Index? Elasticsearch	2	334	November 5, 2019
Purpose and usage of index at ES Elasticsearch	2	326	July 19, 2019
Best practices for dynamic expiration Logstash	2	242	August 16, 2022

How to seperate indexed data by customerId, can indexes expire?

Related topics