Few questions about analyzers, mappings and queries in the context of search for substrings


(GianMaria Romanato) #1

Ciao all,

I spent the last few hours studying ElasticSearch documentation, reading
posts in this group and searching Stackoverflow and over the Web in general.

I manage to search for substrings as follows:

  • created an empty index
  • defined index analysis settings to use a a custom analyzer that uses
    an nGram filter
  • defined mappings for an index and type specifying the above analyzer
    for each property
  • posted some documents to the right index and type

Everything works as expected, but I still have some doubts:

  1. the elasticsearch documentation states that the mappings definition
    is not needed in general
    (http://www.elasticsearch.org/guide/reference/mapping/), but in my tests
    search for substrings did not work without mappings. Is this the case or
    there is a mistake in my code and search for substrings should work without
    mappings?
  2. is there a way to define mappings that apply to all the types in an
    index?
  3. search for substring works, but only if I am specifying the field
    name in the query (curl "http://localhost:9200/1/_search?q=description:desc"
    ), while a text query returns no hits (curl
    "http://localhost:9200/1/_search?q=desc"). Why?
  4. I created the index and specified settings and mappings using CURL.
    When using the Java API is it correct to use the exact same JSON below as
    the input of ImmutableSettings.Builder.loadFromSource() ?

Here are the settings and mappings I used:
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"my_analyzer":{
"type":"custom",
"tokenizer":"lowercase",
"filter":[
"my_ngram"
]
}
},
"filter":{
"my_ngram":{
"type":"nGram",
"min_gram":3,
"max_gram":20
}
}
}
}
},
"mappings":{
"overview":{
"properties":{
"title":{ "type":"string", "analyzer":"my_analyzer", "boost"
:10 },
"description":{ "type":"string", "analyzer":"my_analyzer",
"boost":10 },
"id":{ "type":"string", "analyzer":"my_analyzer", "boost":10
},
"author":{ "type":"string", "analyzer":"my_analyzer",
"boost":10 },
}
}
}
}

Thanks in advance.
GianMaria.

--


(GianMaria Romanato) #2

I have another question:

I splitted the settings and mappings in two json files, because I would
like the mappings to be applied to several index types whose names are not
known in advance.

So I am creating an index specifying only the settings and then, the first
time a new type is used I am creating a putmappingrequest to pass the
mapping for that index type.

However, this approach fails, because the analyzer is defined in the
settings but when I try to provide mappings for the index type the parser
cannot resolve the analyzer:

org.elasticsearch.index.mapper.MapperParsingException: Analyzer [my_analyzer
] not found for field [title]
at org.elasticsearch.index.mapper.core.TypeParsers.parseField(
TypeParsers.java:74) ~[elasticsearch-0.19.8.jar:na]
at org.elasticsearch.index.mapper.core.StringFieldMapper$TypeParser.
parse(StringFieldMapper.java:116) ~[elasticsearch-0.19.8.jar:na]
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.
parseProperties(ObjectMapper.java:261) ~[elasticsearch-0.19.8.jar:na]
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parse(
ObjectMapper.java:217) ~[elasticsearch-0.19.8.jar:na]
at org.elasticsearch.index.mapper.DocumentMapperParser.parse(
DocumentMapperParser.java:161) ~[elasticsearch-0.19.8.jar:na]
at org.elasticsearch.index.mapper.MapperService.parse(MapperService.java
:271) ~[elasticsearch-0.19.8.jar:na]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$4.execute(
MetaDataMappingService.java:317) ~[elasticsearch-0.19.8.jar:na]
at org.elasticsearch.cluster.service.InternalClusterService$2.run(
InternalClusterService.java:211) ~[elasticsearch-0.19.8.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1110) ~[na:1.7.0_07]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:603) ~[na:1.7.0_07]

Any idea?

Giamma.

--


(Ivan Brusic) #3

Hi GianMaria,

Answers inline.

On Mon, Sep 24, 2012 at 5:03 AM, GianMaria Romanato
gm.romanato@gmail.com wrote:

the elasticsearch documentation states that the mappings definition is not
needed in general (http://www.elasticsearch.org/guide/reference/mapping/),
but in my tests search for substrings did not work without mappings. Is this
the case or there is a mistake in my code and search for substrings should
work without mappings?

ElasticSearch will work without an explicit mapping, but only with the
default behavior. That same webpage states: "Only when the defaults
need to be overridden must a mapping definition be provided.".
Searching by substring is definitley not the default.

is there a way to define mappings that apply to all the types in an index?

Not sure, but take a look at index templates. Perhaps the type name
can be a wildcard.

http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html

search for substring works, but only if I am specifying the field name in
the query (curl "http://localhost:9200/1/_search?q=description:desc"
), while a text query returns no hits (curl
"http://localhost:9200/1/_search?q=desc"). Why?

Queries with no field specified will use the _all field, which uses
the default analyzer.

http://www.elasticsearch.org/guide/reference/mapping/all-field.html

I created the index and specified settings and mappings using CURL. When
using the Java API is it correct to use the exact same JSON below as the
input of ImmutableSettings.Builder.loadFromSource() ?

Never tried to use JSON with Java.

Cheers,

Ivan

--


(GianMaria Romanato) #4

Hi Ivan,

thank you very much for your helpful reply, it clarified some of my doubts.
I did not try your suggestion of using templates because I managed to
create the mapping on demand using the Java API and a template JSON
document.

In fact, with respect to my last question about JSON, I confirm that it is
possible to set the type mapping or the index settings using a JSON
document via the Java API. However, due to lack of good examples, I found
a bit difficult to create correct JSON documents. So after a couple of
unsuccessful tries, I decided to first create an index and a type with
default settings and auto-generated mapping. Then I used the _settings and
_mapping REST API to enquiry for the configuration, and the reponse
messages helped me figure out how to write a correct JSON document for the
index settings and a second JSON document for the type mapping.

Again, thank you for your help.
Giamma.

--


(system) #5