hi
i'm new to ES and to the serach server world, i decided to check ES
becuase i was in some reviews that it easy to use.
i managed to index my docs (10M) and to run some queried but the
problem they are very very slow (10 sec).j
i didn't do any setting and didnt change anything in the yml file or
any other file. my problem in setting and tunning the system is that
i'm very weak in all the index world and dont have background in
clusters,nodes etc.. i don't know how to proced becuase the more i
read the more i dont understand and each subject to learn lead me to
more subject...
does anyone know a very basic tutorial to the basic concept and what
are the basis tuning for performance ?
1)if not maybe just explain to me that the difference between index
name ,index type
2) does any doc need its own id?
if you can share some info about your documents and what kind of queries
you were running, that could help a much.
As for 1)
You can think of index name as an identifier for single Lucene index. All
document types that fall under the same index name are indexed into the
same Lucene index (the index can be in fact distributed thus for one index
name there can exists several Lucene indices on different machines but
let's not complicate the discussion by that for now). Try to search on ES
doc pages and read some bits about it. Index name/types is not complicated
stuff.
For example take a look at:
hi
i'm new to ES and to the serach server world, i decided to check ES
becuase i was in some reviews that it easy to use.
i managed to index my docs (10M) and to run some queried but the
problem they are very very slow (10 sec).j
i didn't do any setting and didnt change anything in the yml file or
any other file. my problem in setting and tunning the system is that
i'm very weak in all the index world and dont have background in
clusters,nodes etc.. i don't know how to proced becuase the more i
read the more i dont understand and each subject to learn lead me to
more subject...
does anyone know a very basic tutorial to the basic concept and what
are the basis tuning for performance ?
1)if not maybe just explain to me that the difference between index
name ,index type
2) does any doc need its own id?
first, thanks for the quick replay.. it helped a alot
what i tried to do is to use the Es java api to index my DB. each
record in the DB represent a book . this is he code
IndexResponse response =
client.prepareIndex(ES.INDEX_NAME,ES.INDEX_TYPE, ""+counter)
.setSource(jsonBuilder() .startObject()
.field("id_unique",counter)
.field("id", catalogID.toString())
.field( "parentIsbn", parentIsbn )
.field("title", rs.getString( "Title" ) )
.field( "title_exact", rs.getString( "Title" ).toLowerCase())
.field("seriesId", rs.getString( "SeriesID" ))
.field("seriesNumber",
rs.getString( "SeriesNumber" ))
.field( "publicationDate", rs.getString( "PublicationDate" ))
.field("editionNumber",
rs.getInt( "EditionNumber" ))
.field("edition", rs.getString( "EditionDescription" ))
.field( "createDate", rs.getDate( "CreateDt" ) )
.field("updateDate", rs.getDate( "UpdateDt" ))
.field("description",
rs.getString( "DescriptionTxt" ))
.field("textbook", rs.getInt( "TextBook" ) == 1 ? true : false )
.field("catgory.k","bbbbbbbbbbbbbbbbbbbb")
.field("longText",
.endObject() ) .execute()
.actionGet();
in my code each book got a different unique ID_index ={ the loop
counter}. now i don't know how this data is analyzed in the server ,
on which filed an index is created ??
becuase now when i'm tring to serach for a word or 2-3 word
combination in the all doc(q=" ") or serach direct
inside a field (q=title:)
it takes something like 10-15 seconds ?..
how can i know on which field of the json an index is created ?
does it have non-clustred index ?
or how can i decide which fileds i wnat them to be index?
and what setting can help me improve performance? ( now all my
settings is default and my index proccess is simply a loop on the DB
record and send then using java api)
as a next step I would recommend you to take a look at mapping:
Once you index your data, investigate what mappings was used:
This will help you understand which fields are searchable and which are not
(from the quick glance at your example I think all fields are searchable).
Most of your fields (probably all) will be automatically mapped to one of
core types, you can learn in the below documentation which analysis was
used by default (you can change it but since you do not use mapping now the
default analyzers are used):
The search response is quite slow in your case. Apart from wrong search API
use it can have many other reasons (your cluster gets connected with other
cluster on your network for example or some expensive process is running on
the machine, it has not enough memory ... etc).
BTW Do not hesitate to share your search code as well. (You can use gist
instead of pasting code snippets into mail)
first, thanks for the quick replay.. it helped a alot
what i tried to do is to use the Es java api to index my DB. each
record in the DB represent a book . this is he code
IndexResponse response =
client.prepareIndex(ES.INDEX_NAME,ES.INDEX_TYPE, ""+counter)
in my code each book got a different unique ID_index ={ the loop
counter}. now i don't know how this data is analyzed in the server ,
on which filed an index is created ??
becuase now when i'm tring to serach for a word or 2-3 word
combination in the all doc(q=" ") or serach direct
inside a field (q=title:)
it takes something like 10-15 seconds ?..
how can i know on which field of the json an index is created ?
does it have non-clustred index ?
or how can i decide which fileds i wnat them to be index?
and what setting can help me improve performance? ( now all my
settings is default and my index proccess is simply a loop on the DB
record and send then using java api)
It seems that you use the default properties of ES. For more than 1M docs, I
suggest to modify memory settings
set ES_MIN_MEM and set ES_MAX_MEM (defaut to 256m and 1g)
If you run under Windows 32bits, you will certainly have a constraint with
available physical memory for the JVM (contiguous memory), so you will certainly
not be able to go after 1500m
As you are "starting" with ES, try to play with less documents (IMHO 100k docs
is enough to test search, facets and index/shards/nodes concepts)
This will help you understand which fields are searchable and which are not
(from the quick glance at your example I think all fields are searchable).
Most of your fields (probably all) will be automatically mapped to one of
core types, you can learn in the below documentation which analysis was
used by default (you can change it but since you do not use mapping now the
default analyzers are used): Elasticsearch Platform — Find real-time answers at scale | Elastic
The search response is quite slow in your case. Apart from wrong search API
use it can have many other reasons (your cluster gets connected with other
cluster on your network for example or some expensive process is running on
the machine, it has not enough memory ... etc).
BTW Do not hesitate to share your search code as well. (You can use gist
instead of pasting code snippets into mail)
first, thanks for the quick replay.. it helped a alot
what i tried to do is to use the Es java api to index my DB. each
record in the DB represent a book . this is he code
IndexResponse response =
client.prepareIndex(ES.INDEX_NAME,ES.INDEX_TYPE, ""+counter)
in my code each book got a different unique ID_index ={ the loop
counter}. now i don't know how this data is analyzed in the server ,
on which filed an index is created ??
becuase now when i'm tring to serach for a word or 2-3 word
combination in the all doc(q=" ") or serach direct
inside a field (q=title:)
it takes something like 10-15 seconds ?..
how can i know on which field of the json an index is created ?
does it have non-clustred index ?
or how can i decide which fileds i wnat them to be index?
and what setting can help me improve performance? ( now all my
settings is default and my index proccess is simply a loop on the DB
record and send then using java api)
about the serach testing , i didn't check to search with the java
api .. i just went to the browser navigation and tried some queries
some on all fields (uning q= ) and some on
fields(q=:).
the reason i'm starting with 10M docs and not lower number is because
i'm trying to check if the ES is a good solution to my site search
engine.
(now we use solr and i need to decide if to move to ES)
do u know if 10M-15M docs with complex query and return result in less
then 2 second can be done with ES?
and what is the load ES can handle ? is 1k earch request in a second
is reasonable?
about the serach testing , i didn't check to search with the java
api .. i just went to the browser navigation and tried some queries
some on all fields (uning q= ) and some on
fields(q=:).
the reason i'm starting with 10M docs and not lower number is because
i'm trying to check if the ES is a good solution to my site search
engine.
(now we use solr and i need to decide if to move to ES)
do u know if 10M-15M docs with complex query and return result in less
then 2 second can be done with ES?
and what is the load ES can handle ? is 1k earch request in a second
is reasonable?
Its a bit hard to help with advice without knowing a bit more information
on what you are running. Which operating system, how much memory do you
have on the server, how much is allocated to the elasticsearch process. How
many machines are you running (this one I think I can guess, which is 1). A
simple term based search should not take more than a second.
about the serach testing , i didn't check to search with the java
api .. i just went to the browser navigation and tried some queries
some on all fields (uning q= ) and some on
fields(q=:).
the reason i'm starting with 10M docs and not lower number is because
i'm trying to check if the ES is a good solution to my site search
engine.
(now we use solr and i need to decide if to move to ES)
do u know if 10M-15M docs with complex query and return result in less
then 2 second can be done with ES?
and what is the load ES can handle ? is 1k earch request in a second
is reasonable?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.