Beginners Problems


(sun) #1

Hi, I am new to ES and to search engines in general. I installed
0.18.7 and my notebook (Mac OS X 10.7.2) and stumble over these
things:

  1. Calling "bin/elasticsearch stop" seems to start a new node instead
    of shutting down ES

  2. I pump in data via the Java API but after some thousand documents
    the cluster becomes yellow and I get exceptions calling prepareIndex.

{
cluster_name: elasticsearch
status: yellow
timed_out: false
number_of_nodes: 1
number_of_data_nodes: 1
active_primary_shards: 5
active_shards: 5
relocating_shards: 0
initializing_shards: 0
unassigned_shards: 5
}

Killing the EL process and restarting EL does not change the
situation. The only thing that works for me is deleting the data
directory before restarting but this leads into a circle ... I see
nothing that looks like an error in the log file.

  1. I managed to use the Java API to index my data but not to do the
    query I want. The Java documentation needs some more love I think.
    With a matchAllQuery() I get out everything but I wanted something
    like find me the text "xyz" occurring anywhere int the title or the
    text of my document and any one of some given categories. I gave up
    after hours. Can anybody point me in the right direction. I now I need
    to read the source but no time right now.

I need to understand the concepts of ES and search engines much more
but what I understand so far is fantastic. What have I done so many
years using only databases? Thanks for the good work and open sourcing
it. I hope I get over the beginners hurdles quickly.


(Maurício Linhares) #2

On Mon, Jan 30, 2012 at 10:47 AM, sun google@suncom.de wrote:

Hi, I am new to ES and to search engines in general. I installed
0.18.7 and my notebook (Mac OS X 10.7.2) and stumble over these
things:

  1. Calling "bin/elasticsearch stop" seems to start a new node instead
    of shutting down ES

You are probably referencing an older version of ES that came with the
service wrapper script, it is not bundled anymore, you have to grab it
from here -> https://github.com/elasticsearch/elasticsearch-servicewrapper

But if this is just for starting and stopping on your machine, i'd just do this:

cd "ES_HOME"/bin
./elasticsearch -p search.pid # to start
kill cat search.pid # to stop it

  1. I pump in data via the Java API but after some thousand documents
    the cluster becomes yellow and I get exceptions calling prepareIndex.

{
cluster_name: elasticsearch
status: yellow
timed_out: false
number_of_nodes: 1
number_of_data_nodes: 1
active_primary_shards: 5
active_shards: 5
relocating_shards: 0
initializing_shards: 0
unassigned_shards: 5
}

Killing the EL process and restarting EL does not change the
situation. The only thing that works for me is deleting the data
directory before restarting but this leads into a circle ... I see
nothing that looks like an error in the log file.

You said you get exceptions, what exceptions do you get?

  1. I managed to use the Java API to index my data but not to do the
    query I want. The Java documentation needs some more love I think.
    With a matchAllQuery() I get out everything but I wanted something
    like find me the text "xyz" occurring anywhere int the title or the
    text of my document and any one of some given categories. I gave up
    after hours. Can anybody point me in the right direction. I now I need
    to read the source but no time right now.

A simple example of how query for the text "elastic" in all fields
from the "news_story" document:

curl -X GET "http://localhost:9200/news_stories/news_story/_search?pretty=true"
-d '{"query":{"query_string":{"query":"elastic"}},"size":10,"from":0}'

Maurício Linhares
http://techbot.me/ - http://twitter.com/#!/mauriciojr


(sun) #3

Hi Maurício,

thanks for responding, but my questions are still open.

  1. I am using version 0.18.7. This morning it was the most actual.

  2. and 3. My questions relate to the Java API, not the REST API. The
    exception states that the ES cluster is not reachable (timeout).

Stefan


(Maurício Linhares) #4

On Mon, Jan 30, 2012 at 12:51 PM, sun google@suncom.de wrote:

Hi Maurício,

thanks for responding, but my questions are still open.

  1. I am using version 0.18.7. This morning it was the most actual.

The tutorial piece you are following that says the bin/elasticsearch
script accepts "start/stop" is outdated, the script that does it is at
the repo I linked in the previous email, the bin/elasticsearch script
does not accept "start/stop", but you can use the commands i provided
in the previous email to start/stop it.

  1. and 3. My questions relate to the Java API, not the REST API. The
    exception states that the ES cluster is not reachable (timeout).

If it says the cluster is not reachable either you're not connecting
to the correct port/ip pair or you don't have an ES server running at
the port/ip pair you're providing.

Maurício Linhares
http://techbot.me/ - http://twitter.com/#!/mauriciojr


(sun) #5

Thanks Maurício,

now I got it.

Anyone can help me with questions 2. and 3.?


(Shay Banon) #6
  1. The status is yellow, yes, because you start a single node and elasticsearch by default creates 5 shards each with 1 replica. That replica has no place to be allocated to. The yellow health will be there from the start, once you created the index. This will not cause the index to fail though, what exception are you getting? (gist it).

  2. It might relate to how you index the data. A gist with a sample of what you index and what you search can help (preferably with curl), but note that by default, text is analyzed and broken down into tokens. If you use term query to search, you might not find anything because its not analyzed (the text in the query), use text query instead.

On Tuesday, January 31, 2012 at 12:47 AM, sun wrote:

Thanks Maurício,

now I got it.

Anyone can help me with questions 2. and 3.?


(sun) #7
  1. I got a timeout exception, no ES exception. I have not got this
    again. I did not understand what a shard or a replica means when I
    asked the question. Now I am reading the lucene documentation and
    slowly the concepts get clearer to me. I think it is necesary to
    understand lucene before starting with ES otherwise the concepts are
    not clear, there are still many terms I need to find in the lucene
    documentation like cluster, node (I guess its not the same as
    QueryNode), replica, mapping, facet.

  2. I guess I only implicitely create an index. I tried to create an
    index explicitely with a mapping of which I found an example here:
    http://groups.google.com/group/elasticsearch/browse_thread/thread/d92056b2beff1f40/8309396b5eff2f8e?lnk=gst&q=preparePutMapping
    But I get an IndexMissingException.

I published a gist here git://gist.github.com/1755678.git. It is in
Scala, but very close to Java. I am just iterating per reflection over
all my model properties.
I tried to work with JSON but without success, I write a separate
question about it.


(Shay Banon) #8

On Tuesday, February 7, 2012 at 1:01 AM, sun wrote:

  1. I got a timeout exception, no ES exception. I have not got this
    again. I did not understand what a shard or a replica means when I
    asked the question. Now I am reading the lucene documentation and
    slowly the concepts get clearer to me. I think it is necesary to
    understand lucene before starting with ES otherwise the concepts are
    not clear, there are still many terms I need to find in the lucene
    documentation like cluster, node (I guess its not the same as
    QueryNode), replica, mapping, facet.

All of those are elasticsearch concepts, not Lucene :slight_smile:

  1. I guess I only implicitely create an index. I tried to create an
    index explicitely with a mapping of which I found an example here:
    http://groups.google.com/group/elasticsearch/browse_thread/thread/d92056b2beff1f40/8309396b5eff2f8e?lnk=gst&q=preparePutMapping
    But I get an IndexMissingException.

I published a gist here git://gist.github.com/1755678.git (http://gist.github.com/1755678.git). It is in
Scala, but very close to Java. I am just iterating per reflection over
all my model properties.
I tried to work with JSON but without success, I write a separate
question about it.


(system) #9