How to search on multiple fields?


(Navvi) #1

We have indexed all the records from our table to ES.
After that we are searching for some fields on indexed records.

Here is the code snippet for indexing....

for (final Contact contact : contacts) {
final XContentBuilder builder =
XContentFactory.jsonBuilder()
.startObject();
builder.field("CONTACT_ID", contact.getId());
final Map<String, Object> attributes =
contact.getAttributes();
for (final String fieldName : attributes.keySet()) {
builder.field(fieldName,
attributes.get(fieldName));
}
builder.endObject();
final IndexRequestBuilder prepareIndex =
client.prepareIndex(
indexName, type,
StringUtils.lowerCase(String.valueOf(contact.getId())));
prepareIndex.setSource(builder);
bulkRequest.add(prepareIndex);
}

BulkResponse bulkResponse = bulkRequest.execute().actionGet();

While indexing we haven't used any analyzers. What is the default analyzer
it uses if we don't specify any thing ?
If we wanted to use which is best Analyzer to use , I mean Keyword or
standard and how to use that analyzer using java api while indexing ?

Here is the code snippet to search for multiple fields on ES data...

we will get a list of search fields from upper layer to ES layer

SearchAttribute {
private String attributeName;
private String attributeValue;

setters & getters

}

from list of searchAttributes we are building query like...

for(SearchAttributeDo searchAttribute : searchAttributes){
boolQueryBuilder.should(QueryBuilders.termQuery(searchAttribute.getAttributeName(),
searchAttribute.getAttributeValue()));
}

SearchRequestBuilder srb = client.prepareSearch()
.setTypes(repoName)
.setQuery(boolQueryBuilder);
SearchResponse searchResponse = srb.execute().actionGet();

One strange thing we observed was when we give the attribute value in lower
case then only it returning results, otherwise it is not returning results.
For example if the "Lawrence" is indexed as FIELD_NAME but while querying
if we give "Lawrence" we are not getting any results. But when we give as
"lawrence" we are getting response back.

I think that might be because lucene is internally converting everything in
to lower case while indexing.

Please can you suggest us in following things...

  1. what is best analyzers to use while indexing and searching ...
    we have values to be indexed and searched like... TELE_PHONE
    -->"510/782-1117" & EMAIL -- >"elasticquery@help.com"

  2. Is the way which we followed to search on multiple fields with each
    field with it's own value ? Or could you suggest any other better way ? we
    used bool query for that by iterating through list of search attribute
    objects.

Thanks,
Navvi.

--


(Runar Myklebust-2) #2

On Sun, Aug 12, 2012 at 8:05 PM, Navvi vnaveen@prokarmasoftech.com wrote:

While indexing we haven't used any analyzers. What is the default analyzer
it uses if we don't specify any thing ?

http://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer.html

If we wanted to use which is best Analyzer to use , I mean Keyword or
standard and how to use that analyzer using java api while indexing ?

You can set analyzer when creating index, e.g:

CreateIndexRequest createIndexRequest = new CreateIndexRequest( indexName );
createIndexRequest.settings(
indexSettingBuilder.buildIndexSettings() );
client.admin().indices().create( createIndexRequest ).actionGet();

public Settings buildIndexSettings()
{
    final ImmutableSettings.Builder settings =

ImmutableSettings.settingsBuilder();
.....
settings.loadFromSource( buildAnalyzerSettings() );

    return settings.build();
}

private String buildAnalyzerSettings()
{
try
{
return jsonBuilder()
.startObject()
.startObject( "analysis" )
.startObject( "analyzer" )
.startObject("whitespace_analyzer" )
.field( "type", "custom" )
.field( "tokenizer", "whitespace" )
.field( "filter", new String[]{"lowercase"}
)
.endObject().endObject()
.endObject()
.endObject()
.string();
}
catch ( IOException e )
{
throw new ContentIndexException( "Not able to create analyzer
settings for index", e );
}

}

You can also apply directly in your mapping document.

Here is the code snippet to search for multiple fields on ES data...

we will get a list of search fields from upper layer to ES layer

SearchAttribute {
private String attributeName;
private String attributeValue;

setters & getters

}

from list of searchAttributes we are building query like...

for(SearchAttributeDo searchAttribute : searchAttributes){
boolQueryBuilder.should(**QueryBuilders.termQuery(*searchAttribute.
*getAttributeName(), searchAttribute.**getAttributeValue()));
}

SearchRequestBuilder srb = client.prepareSearch()
.setTypes(repoName)
.setQuery(boolQueryBuilder);
SearchResponse searchResponse = srb.execute().actionGet();

One strange thing we observed was when we give the attribute value in
lower case then only it returning results, otherwise it is not returning
results. For example if the "Lawrence" is indexed as FIELD_NAME but while
querying if we give "Lawrence" we are not getting any results. But when we
give as "lawrence" we are getting response back.

The default tokenizer is using lower case.

I think that might be because lucene is internally converting everything
in to lower case while indexing.

Please can you suggest us in following things...

  1. what is best analyzers to use while indexing and searching ...
    we have values to be indexed and searched like... TELE_PHONE -->"
    510/782-1117" & EMAIL -- >"elasticquery@help.com"

Search and read the documentation, there are a lot of different
alternatives depending on your particular need:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/

  1. Is the way which we followed to search on multiple fields with each
    field with it's own value ? Or could you suggest any other better way ? we
    used bool query for that by iterating through list of search attribute
    objects.

I would certainly drop the bool query for cases with just one search value,
but thats just a minor detail.

greetings

Runar Myklebust
Enonic AS
Senior Developer

http://enonic.com/download

--


(system) #3