Hello,
I'm writting this post because i am new ES user and i am not sure if i understand correctly index and store attributes of string fields in explict mapping.
I want to have an ES index which contans list of internet sites.
The document type "site" has 3 fields: url, content, inner_note
I will be searching documents which contains given phrases in "content" field.
I will be retrieving single document which has particular url in "url" field.
The field "inner_note" is only for my internal use and i will not use this field to searching/retrieving documents.
Have i choosed optimal "store" and "index" attributes for my scenario?
Is retrieving document by url field as fast as i would retrieve by ID? Comparing to traditional SQL version: if i want to retrieve single row in SQL table by WHERE url = ? i would create SQL index on url column.
Hello,
I'm writting this post because i am new ES user and i am not sure if i
understand correctly index and store attributes of string fields in explict
mapping.
the "index" attribute tells ES how to index your field:
"analyzed" will break it in terms, so by default you'll be able to
search, for example, for a single word in that field
"not_analyzed" will just index your field as a single term, so only exact
matches will return results
"no" won't index your field at all, making your field unsearcheable
"store" tells ES to store the contents of your field or not.
However, there's are two more things to consider here:
the _source fieldhttp://www.elasticsearch.org/guide/reference/mapping/source-field/,
which by default stores all your fields. So even if you don't store any
field, you can retrieve the contents, because ES will take them from
_source. Typically, you'd store individual fields if you disable _source
I want to have an ES index which contans list of internet sites.
The document type "site" has 3 fields: url, content, inner_note
I will be searching documents which contains given phrases in "content"
field.
I will be retrieving single document which has particular url in "url"
field.
The field "inner_note" is only for my internal use and i will not use this
field to searching/retrieving documents.
Have i choosed optimal "store" and "index" attributes for my scenario?
It should work. I'm not sure about the "optimal" part. I wouldn't store the
"content" field, because it can be retrieved anyway, by using _source.
Is retrieving document by url field as fast as i would retrieve by ID?
Getting by ID is faster, because it doesn't imply a search (even though
your url searches should be fast enough, because you have not_analyzed so
you have a low number of terms). Plus, getting is realtime, as described
here:
Comparing to traditional SQL version: if i want to retrieve single row in
SQL table by WHERE url = ? i would create SQL index on url column.
In the case of ES, the fields are indexed by default. You have to say
index: no if you want it otherwise.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.