How to change the default index settings for Wikipedia river?

Hi there,

I want to index all English wikipedia pages in my ES, therefore I think
it's a convenient way to build the index from wikipedia river.

But I‘m a little confused on how to change the default settings when index
the wikipedia river.

In my understanding, the RESTful api would like as follows:

curl -XPUT localhost:9200/_river/my_river/_meta -d '

{
"type" : "wikipedia",
"wikipedia" : {
"url" : "http://dumps.wikimedia.org/enwiki/20121201/enwiki-20121201-pages-articles-multistream.xml.bz2"
}
}
"index" : {
"name" : "wikipedia",
"type" : "page",
"bulk_size" : 100,
"settings" :{
"index.number_of_shards": "10",
"index.number_of_replicas": "0"
}
}
}’

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I would create index first with settings and then create the river.

HTH

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 mai 2013 à 09:14, Jingang Wang bitwjg@gmail.com a écrit :

Hi there,

I want to index all English wikipedia pages in my ES, therefore I think it's a convenient way to build the index from wikipedia river.

But I‘m a little confused on how to change the default settings when index the wikipedia river.

In my understanding, the RESTful api would like as follows:

curl -XPUT localhost:9200/_river/my_river/_meta -d '
{
"type" : "wikipedia",
"wikipedia" : {
"url" : "http://dumps.wikimedia.org/enwiki/20121201/enwiki-20121201-pages-articles-multistream.xml.bz2"
}
}
"index" : {
"name" : "wikipedia",
"type" : "page",
"bulk_size" : 100,
"settings" :{
"index.number_of_shards": "10",
"index.number_of_replicas": "0"
}
}
}’

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David,

Thanks for your response.
Could you give more details?
Do you mean create the _river index first?

Thanks.

On Tue, May 21, 2013 at 3:21 PM, David Pilato david@pilato.fr wrote:

I would create index first with settings and then create the river.

HTH

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 mai 2013 à 09:14, Jingang Wang bitwjg@gmail.com a écrit :

Hi there,

I want to index all English wikipedia pages in my ES, therefore I think
it's a convenient way to build the index from wikipedia river.

But I‘m a little confused on how to change the default settings when index
the wikipedia river.

In my understanding, the RESTful api would like as follows:

curl -XPUT localhost:9200/_river/my_river/_meta -d '

{
"type" : "wikipedia",
"wikipedia" : {
"url" : "http://dumps.wikimedia.org/enwiki/20121201/enwiki-20121201-pages-articles-multistream.xml.bz2"
}
}
"index" : {
"name" : "wikipedia",
"type" : "page",
"bulk_size" : 100,
"settings" :{
"index.number_of_shards": "10",
"index.number_of_replicas": "0"
}
}
}’

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/aO2OO6KZQPk/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Wang Jingang(王金刚)
Ph.D. Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I said: create first your index using Create Index API: Elasticsearch Platform — Find real-time answers at scale | Elastic

You can define here your number of shards and replica.

THEN (=after index creation), create your river.
curl -XPUT localhost:9200/_river/my_river/_meta -d '{
"type" : "wikipedia",
"wikipedia" : {
"url" : "http://dumps.wikimedia.org/enwiki/20121201/enwiki-20121201-pages-articles-multistream.xml.bz2"

}

}'

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 mai 2013 à 09:28, Jingang Wang bitwjg@gmail.com a écrit :

Hi David,

Thanks for your response.
Could you give more details?
Do you mean create the _river index first?

Thanks.

On Tue, May 21, 2013 at 3:21 PM, David Pilato david@pilato.fr wrote:

I would create index first with settings and then create the river.

HTH

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 mai 2013 à 09:14, Jingang Wang bitwjg@gmail.com a écrit :

Hi there,

I want to index all English wikipedia pages in my ES, therefore I think it's a convenient way to build the index from wikipedia river.

But I‘m a little confused on how to change the default settings when index the wikipedia river.

In my understanding, the RESTful api would like as follows:

curl -XPUT localhost:9200/_river/my_river/_meta -d '
{
"type" : "wikipedia",
"wikipedia" : {
"url" : "http://dumps.wikimedia.org/enwiki/20121201/enwiki-20121201-pages-articles-multistream.xml.bz2"

}

}
"index" : {

  "name" : "wikipedia",

  "type" : "page",

  "bulk_size" :  100,

  "settings" :{

  	"index.number_of_shards": "10",        

                     "index.number_of_replicas": "0"
  }		

}
}’

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/aO2OO6KZQPk/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Wang Jingang(王金刚)
Ph.D. Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks so much, David.

I have resolved this problem according to your advice.

On Tue, May 21, 2013 at 3:31 PM, David Pilato david@pilato.fr wrote:

I said: create first your index using Create Index API:
Elasticsearch Platform — Find real-time answers at scale | Elastic

You can define here your number of shards and replica.

THEN (=after index creation), create your river.

curl -XPUT localhost:9200/_river/my_river/_meta -d '{

"type" : "wikipedia",
"wikipedia" : {
    "url" : "http://dumps.wikimedia.org/enwiki/20121201/enwiki-20121201-pages-articles-multistream.xml.bz2"
}

}'

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 mai 2013 à 09:28, Jingang Wang bitwjg@gmail.com a écrit :

Hi David,

Thanks for your response.
Could you give more details?
Do you mean create the _river index first?

Thanks.

On Tue, May 21, 2013 at 3:21 PM, David Pilato david@pilato.fr wrote:

I would create index first with settings and then create the river.

HTH

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 mai 2013 à 09:14, Jingang Wang bitwjg@gmail.com a écrit :

Hi there,

I want to index all English wikipedia pages in my ES, therefore I think
it's a convenient way to build the index from wikipedia river.

But I‘m a little confused on how to change the default settings when
index the wikipedia river.

In my understanding, the RESTful api would like as follows:

curl -XPUT localhost:9200/_river/my_river/_meta -d '

{
"type" : "wikipedia",

"wikipedia" : {
    "url" : "http://dumps.wikimedia.org/enwiki/20121201/enwiki-20121201-pages-articles-multistream.xml.bz2"
}

}
"index" : {
"name" : "wikipedia",
"type" : "page",
"bulk_size" : 100,
"settings" :{
"index.number_of_shards": "10",
"index.number_of_replicas": "0"
}
}

}’

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/aO2OO6KZQPk/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Wang Jingang(王金刚)
Ph.D. Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/aO2OO6KZQPk/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Wang Jingang(王金刚)
Ph.D. Candidate at
Lab of High Volume Language Information Processing & Cloud Computing
School of Computer Science
Beijing Institute of Technology
Beijing 100081
P.R China

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.