Configuring the default analyzer using the Java API not working

Hi there all...

I run ES embedded in my application. Below is the code I use to create
the node and client, and configure the default analyzer.

Then the code I use to parse a free text query. The analyzer returned
is not the one configured as expected. In fact, if I put breakpoints
in both ES WhitespaceTokenizerFactory and Lucene WhitespaceTokenizer,
none are used.

Questions I have:

  • Do I need to explicitly define the default_index and default_search
    analyzers?
  • When doing such a settings request, do I add to the current
    configuration or do I overwrite the complete analysis config,
    effectively obliterating the whitespace tokenizer declaration?
  • Is there a better way to create this configuration. Is there a way
    to initialize the embedded ES node with a YML config, the same way a
    standalone ES reads from ES_HOME/config?

Thanks all!

Remy

Sorry, here's the code:

Properties esProperties = new Properties();

... esProperties filled from my app config.

    // Create the ES node and client.

    Settings nodeSettings =

ImmutableSettings.settingsBuilder().put(esProperties).build();
elasticSearchNode =
NodeBuilder.nodeBuilder().settings(nodeSettings).node();
elasticSearchClient = elasticSearchNode.client();

    // Default index and analyzer configuration.

    XContentBuilder indexSettings = null;
    try
    {
        indexSettings = XContentFactory.jsonBuilder();

indexSettings.startObject().startObject("index").startObject("analysis").startObject("analyzer")
.startObject("default").field("type",
"custom").field("tokenizer", "whitespace")
.field("filter", new String[] { "asciifolding",
"lowercase" }).endObject().endObject().endObject()
.endObject().endObject();

        logger.info(indexSettings.string());

        UpdateSettingsRequestBuilder settingsRequest =

elasticSearchClient.admin().indices()
.prepareUpdateSettings();
settingsRequest.setSettings(indexSettings.string());
UpdateSettingsResponse settingsResponse =
settingsRequest.execute().actionGet();
ValidateState.notNull(settingsResponse);
}
catch (IOException e)
{
ValidateState.fail();
}

And the snippet where I tokenize the user query:

    /*
     * Process the assembled query string one token at a time. The

provided analyser must tokenize the query the
* same way it was done (or in a compatible fashion if you
know what you are doing) when the indexing took
* place.
*/

    EntityInfo entityInfo = getEntityInfo(entityFilters.get(0));

    // Allow this clause to be added as an AND to an existing

query.

    BoolQueryBuilder subQuery = QueryBuilders.boolQuery();

    AnalyzeRequestBuilder arb =

elasticSearchClient.admin().indices()
.prepareAnalyze(entityInfo.logicalName,
termsAsString);
// This will use the default analyzer configured for all
indices.
AnalyzeResponse analysis = arb.execute().actionGet();
for (AnalyzeToken token : analysis.getTokens())
{
String term = token.getTerm();
if (StringUtils.isNotEmpty(term))
{
...

There I see that the whitespace tokenizer is not called...

Thanks!

Solved my issue by loading the analysis config from a properties file
and passing it to the NodeBuilder directly.

Was I too late in using the client to make an update settings request.

R.

ITs hard to read the code, can you gist it?

On Friday, July 15, 2011 at 6:20 PM, Spring Ninja wrote:

Solved my issue by loading the analysis config from a properties file
and passing it to the NodeBuilder directly.

Was I too late in using the client to make an update settings request.

R.

https://gist.github.com/1085232

You need to provide those settings when you create the index, you can't
update them once the index is open. You can update them if the index is
closed.

On Fri, Jul 15, 2011 at 9:32 PM, Spring Ninja remy.gendron@ingeno.cawrote:

https://gist.github.com/1085232

hey ninja,
can you give an example on how you loaded the config from a properties file...
code snippet maybe.
i am having the same issue.
my custom analyzers are not being recognized
advTHANKSance
-skylyfe