Exporting big number of entries from elasticsearch


(Simeon Zaharici) #1

Hello

I am a new user of elasticsearch. We are using graylog2, a log
management solution that stores the log messages in elasticsearch.

I would like to be able to extract all log messages that are created
in one day for archiving purposes. Our servers and applications
generate around 5 million entries a day. What would be the best way to
extract these entries from elasticsearch ?

When doing a search that would match all these messages the query
hangs forever and the load on the elasticserch node against which the
query is executed goes up.

Here is the query I am executing

curl -XGET 'http://server:9200/graylog2/message/_search' -d "{
"size" : "$MESSAGES",
"query" : {
"range" : {
"created_at" : {
"from" : "$DATE_YESTERDAY",
"to" : "$DATE_NOW"
}
}
}
}"|/usr/bin/bzip2 - > /var/lib/lc-archive/$date_today.json.bz2

Thanks


(Berkay Mollamustafaoglu-2) #2

Hi,
You may want to take a look at the scan search type
http://www.elasticsearch.org/guide/reference/api/search/search-type.html

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Tue, Jan 17, 2012 at 11:11 AM, Simeon Zaharici <simeon.zaharici@gmail.com

wrote:

Hello

I am a new user of elasticsearch. We are using graylog2, a log
management solution that stores the log messages in elasticsearch.

I would like to be able to extract all log messages that are created
in one day for archiving purposes. Our servers and applications
generate around 5 million entries a day. What would be the best way to
extract these entries from elasticsearch ?

When doing a search that would match all these messages the query
hangs forever and the load on the elasticserch node against which the
query is executed goes up.

Here is the query I am executing

curl -XGET 'http://server:9200/graylog2/message/_search' -d "{
"size" : "$MESSAGES",
"query" : {
"range" : {
"created_at" : {
"from" : "$DATE_YESTERDAY",
"to" : "$DATE_NOW"
}
}
}
}"|/usr/bin/bzip2 - > /var/lib/lc-archive/$date_today.json.bz2

Thanks


(Simeon Zaharici) #3

Hello

Thanks for your answer, I tried using the scan search type but the
behavior is the same, the curl request against the search_id hangs
forever and the elasticsearch node against which the query was run
becomes non-responsive...

Thanks

On Jan 17, 11:17 am, Berkay Mollamustafaoglu mber...@gmail.com
wrote:

Hi,
You may want to take a look at the scan search typehttp://www.elasticsearch.org/guide/reference/api/search/search-type.html

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Tue, Jan 17, 2012 at 11:11 AM, Simeon Zaharici <simeon.zahar...@gmail.com

wrote:
Hello

I am a new user of elasticsearch. We are using graylog2, a log
management solution that stores the log messages in elasticsearch.

I would like to be able to extract all log messages that are created
in one day for archiving purposes. Our servers and applications
generate around 5 million entries a day. What would be the best way to
extract these entries from elasticsearch ?

When doing a search that would match all these messages the query
hangs forever and the load on the elasticserch node against which the
query is executed goes up.

Here is the query I am executing

curl -XGET 'http://server:9200/graylog2/message/_search'-d "{
"size" : "$MESSAGES",
"query" : {
"range" : {
"created_at" : {
"from" : "$DATE_YESTERDAY",
"to" : "$DATE_NOW"
}
}
}
}"|/usr/bin/bzip2 - > /var/lib/lc-archive/$date_today.json.bz2

Thanks


(Clinton Gormley) #4

On Tue, 2012-01-17 at 13:43 -0800, Simeon Zaharici wrote:

Hello

Thanks for your answer, I tried using the scan search type but the
behavior is the same, the curl request against the search_id hangs
forever and the elasticsearch node against which the query was run
becomes non-responsive...

You didn't specify what value $MESSAGES has.

The idea is to use search_type=scan and to use a scrolled search, with a
reasonable size (eg 1000).

So you pull the first 1000 (x no of primary shards), then keep pulling
until there are no more records left to pull

clint

Thanks

On Jan 17, 11:17 am, Berkay Mollamustafaoglu mber...@gmail.com
wrote:

Hi,
You may want to take a look at the scan search typehttp://www.elasticsearch.org/guide/reference/api/search/search-type.html

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Tue, Jan 17, 2012 at 11:11 AM, Simeon Zaharici <simeon.zahar...@gmail.com

wrote:
Hello

I am a new user of elasticsearch. We are using graylog2, a log
management solution that stores the log messages in elasticsearch.

I would like to be able to extract all log messages that are created
in one day for archiving purposes. Our servers and applications
generate around 5 million entries a day. What would be the best way to
extract these entries from elasticsearch ?

When doing a search that would match all these messages the query
hangs forever and the load on the elasticserch node against which the
query is executed goes up.

Here is the query I am executing

curl -XGET 'http://server:9200/graylog2/message/_search'-d "{
"size" : "$MESSAGES",
"query" : {
"range" : {
"created_at" : {
"from" : "$DATE_YESTERDAY",
"to" : "$DATE_NOW"
}
}
}
}"|/usr/bin/bzip2 - > /var/lib/lc-archive/$date_today.json.bz2

Thanks


(Karussell) #5

I would like to be able to extract all log messages that are created in one day for archiving purposes.

Why not backup one full index? It would be much faster and cheaper in
terms of CPU? (not sure if graylog has an option to move to a new
index per day or sth.)

Peter.

On 17 Jan., 17:11, Simeon Zaharici simeon.zahar...@gmail.com wrote:

Hello

I am a new user of elasticsearch. We are using graylog2, a log
management solution that stores the log messages in elasticsearch.

I would like to be able to extract all log messages that are created
in one day for archiving purposes. Our servers and applications
generate around 5 million entries a day. What would be the best way to
extract these entries from elasticsearch ?

When doing a search that would match all these messages the query
hangs forever and the load on the elasticserch node against which the
query is executed goes up.

Here is the query I am executing

curl -XGET 'http://server:9200/graylog2/message/_search'-d "{
"size" : "$MESSAGES",
"query" : {
"range" : {
"created_at" : {
"from" : "$DATE_YESTERDAY",
"to" : "$DATE_NOW"
}
}
}

}"|/usr/bin/bzip2 - > /var/lib/lc-archive/$date_today.json.bz2

Thanks


(Simeon Zaharici) #6

Ah, thanks a lot I was using the scroll all wrong, I had 3million
entries as size. It works very well by scrolling a reasonable amount
of entries

On Jan 17, 5:39 pm, Clinton Gormley cl...@traveljury.com wrote:

On Tue, 2012-01-17 at 13:43 -0800, Simeon Zaharici wrote:

Hello

Thanks for your answer, I tried using the scan search type but the
behavior is the same, the curl request against the search_id hangs
forever and the elasticsearch node against which the query was run
becomes non-responsive...

You didn't specify what value $MESSAGES has.

The idea is to use search_type=scan and to use a scrolled search, with a
reasonable size (eg 1000).

So you pull the first 1000 (x no of primary shards), then keep pulling
until there are no more records left to pull

clint

Thanks

On Jan 17, 11:17 am, Berkay Mollamustafaoglu mber...@gmail.com
wrote:

Hi,
You may want to take a look at the scan search typehttp://www.elasticsearch.org/guide/reference/api/search/search-type.html

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Tue, Jan 17, 2012 at 11:11 AM, Simeon Zaharici <simeon.zahar...@gmail.com

wrote:
Hello

I am a new user of elasticsearch. We are using graylog2, a log
management solution that stores the log messages in elasticsearch.

I would like to be able to extract all log messages that are created
in one day for archiving purposes. Our servers and applications
generate around 5 million entries a day. What would be the best way to
extract these entries from elasticsearch ?

When doing a search that would match all these messages the query
hangs forever and the load on the elasticserch node against which the
query is executed goes up.

Here is the query I am executing

curl -XGET 'http://server:9200/graylog2/message/_search'-d"{
"size" : "$MESSAGES",
"query" : {
"range" : {
"created_at" : {
"from" : "$DATE_YESTERDAY",
"to" : "$DATE_NOW"
}
}
}
}"|/usr/bin/bzip2 - > /var/lib/lc-archive/$date_today.json.bz2

Thanks


(Simeon Zaharici) #7

Hi,

No graylog does not have that option ( yet ? ) so that's why I am
implementing the daily archive

On Jan 18, 4:28 am, Karussell tableyourt...@googlemail.com wrote:

I would like to be able to extract all log messages that are created in one day for archiving purposes.

Why not backup one full index? It would be much faster and cheaper in
terms of CPU? (not sure if graylog has an option to move to a new
index per day or sth.)

Peter.

On 17 Jan., 17:11, Simeon Zaharici simeon.zahar...@gmail.com wrote:

Hello

I am a new user of elasticsearch. We are using graylog2, a log
management solution that stores the log messages in elasticsearch.

I would like to be able to extract all log messages that are created
in one day for archiving purposes. Our servers and applications
generate around 5 million entries a day. What would be the best way to
extract these entries from elasticsearch ?

When doing a search that would match all these messages the query
hangs forever and the load on the elasticserch node against which the
query is executed goes up.

Here is the query I am executing

curl -XGET 'http://server:9200/graylog2/message/_search'-d"{
"size" : "$MESSAGES",
"query" : {
"range" : {
"created_at" : {
"from" : "$DATE_YESTERDAY",
"to" : "$DATE_NOW"
}
}
}

}"|/usr/bin/bzip2 - > /var/lib/lc-archive/$date_today.json.bz2

Thanks


(system) #8