Mappings & Analyzer


(bcoder) #1

I tried to google and read elasticsearch.org before. But due to my
lack of previous background in Lucene, etc, I find it tough to grasp
idea of mapping and analyzer in elasticsearch.

My documents schema is something like this:

{
"a": ["astr1", "astr2", "astr3", ...],
"dont_search_me": "blahhhh",
"substring_search_me": "hello world",
"b": {
"c": ["cstr1", "cstr2", ...],
"description": "A potentially large string here ...",
"name": "blah",
"class": {
"class1": "description for class1",
"class2": "description for class2",
....
}
"modified": 1234567 // An integer value (unix timestamp)
}
}

I want to:

  1. Query by typing in some free text, which should match text in any
    of the fields (including inside the string arrays), except for field:
    "dont_search_me"
  2. I also want to search "keys" in b.class field, i.e., "class1",
    "class2",... should also match search result (key names, and also
    their values)
  3. Filter the result by modified date
  4. A match within field "name" should have rank higher, then say
    inside any string in array "c"
  5. The field "substring_search_me" should allow arbitrary substring
    matching inside it's content. For example, in content above, a string
    "ello" should match the field
  6. I want to get all unique string in "c" (across all search results),
    i.e., facets for "c" along with their counts (in the search output).

Can someone suggest me mapping, analyzer and kind of queries I require
for this task.
From my constrained understanding I think, I need an NGram analyzer
for allowing substring matching (is that correct?) .. How do I remove
field "substring_search_me" from index (apart from removing it while
adding document to elasticsearch itself ?).

Also how do I allow string matching for "key" names as well (but only
inside b.class hash, and not any other key name) ? For example in
above: "class1" should match the document (since it is inside b.class
hash), but "name" should not match (since it is not inside b.class
hash). How do I specify such a mapping ?

How do I stop content of "modified" field from being searched using
text searching, but still be able to filter on it's value based on
range of dates etc ?

Thanks a ton for your help in advance!


(Eric Jain) #2

On Apr 27, 12:04 am, bcoder blitzkriegco...@gmail.com wrote:

  1. Query by typing in some free text, which should match text in any
    of the fields (including inside the string arrays), except for field:
    "dont_search_me"

Add index : "no" to the mapping for the "dont_search_me" field. All
other fields will be copied to an _all field by default.

http://www.elasticsearch.org/guide/reference/mapping/all-field.html

  1. I also want to search "keys" in b.class field, i.e., "class1",
    "class2",... should also match search result (key names, and also
    their values)

You can do simple field or term queries like a.b.class1:description.

  1. Filter the result by modified date

You can filter on exact dates or date ranges.

http://www.elasticsearch.org/guide/reference/query-dsl/numeric-range-filter.html

  1. A match within field "name" should have rank higher, then say
    inside any string in array "c"

You can set a _boost value either when indexing, or in the query.

http://www.elasticsearch.org/guide/reference/mapping/boost-field.html
http://www.elasticsearch.org/guide/reference/query-dsl/boosting-query.html

  1. The field "substring_search_me" should allow arbitrary substring
    matching inside it's content. For example, in content above, a string
    "ello" should match the field

You can do wildcard queries on "not_analyzed" fields, however this may
be slow (unless using the ngram tokenizer?).

http://www.elasticsearch.org/guide/reference/query-dsl/wildcard-query.html

  1. I want to get all unique string in "c" (across all search results),
    i.e., facets for "c" along with their counts (in the search output).

Term facets should work on nested fields.

http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html

.. How do I remove
field "substring_search_me" from index (apart from removing it while
adding document to elasticsearch itself ?).

You can either just not index it (see above), or you can filter it
from _source, in which case it could still be indexed, but won't be
present when you retrieve the document.

http://www.elasticsearch.org/guide/reference/mapping/source-field.html

Also how do I allow string matching for "key" names as well (but only
inside b.class hash, and not any other key name) ? For example in
above: "class1" should match the document (since it is inside b.class
hash), but "name" should not match (since it is not inside b.class
hash). How do I specify such a mapping ?

There is no need for a special mapping, you can specify the path in
the query (see above).

How do I stop content of "modified" field from being searched using
text searching, but still be able to filter on it's value based on
range of dates etc ?

Add "include_in_all" : false to the mapping for the "modified" field.


(system) #3