Universal search

Hello to everyone.
I have a string "Jhon Abraham 18". I want to create search query that will search by divided by spaces words from the string in an index. This search have to be set to all fields of the index and you don't know what meaning have to be mapped(set) to a field.
So, I have a document:

{
  "_index": "recipient",
  "_type": "recipient",
  "_id": "37a15258d9",
  "_version": 1,
  "_score": 1,
  "_source": {
    "name": "Jhon ",
    "surname": "Abraham",
    "age": "18 ",
}

and I don't know to what fields of index meanings Jhon, Abraham and 18 correspond. I just have a string and by this string I want to search in all fields of the index documents. I can divide it by separete words by spaces but I don't know exact mapping fields for search. Also, I want to do it at Java.
I'll be appreciate for help.

The answer depends on the version of elasticsearch that you are using. In 6.x and above you can use multi_match query with "*" in fields. For the best results numeric fields should be mapped as strings, otherwise you might run into issues:

DELETE test

PUT test
{
  "settings": {
    "number_of_shards": 1
  }
}

PUT test/doc/1
{
    "name": "Jhon",
    "surname": "Abraham",
    "age": "18 "
}


PUT test/doc/2
{
    "name": "John",
    "surname": "Smith",
    "age": "19 "
    
}

POST test/_search
{
  "query": {
    "multi_match": {
      "query": "John Abraham 18",
      "fields": ["*"],
      "lenient": "true",
      "type": "most_fields"
    }
  }
}

Thanks a lot. Do you know such example on Java?

No, I don't. What's the issue with java?

I have Java code:

List<Recipient> searchRecipients = new ArrayList<>();
        SearchRequest idSearchRequest = new SearchRequest("test");
SearchSourceBuilder idSearchSourceBuilder = new SearchSourceBuilder();

        QueryBuilder qb = multiMatchQuery(searchParameters, "*").type(MOST_FIELDS);
        idSearchSourceBuilder.query(qb);
        idSearchRequest.source(idSearchSourceBuilder);

        SearchResponse searchResponse;
        try {
            searchResponse = esClient.search(idSearchRequest, RequestOptions.DEFAULT);
            SearchHit[] searchHits = searchResponse.getHits().getHits();
            for (SearchHit searchHit : searchHits) {
                Recipient searchHitRecipient = modelMapper().readValue(searchHit.getSourceAsString(), Recipient.class);
                searchRecipients.add(searchHitRecipient);
            }
        } catch (IOException e) {
            log.error(e.getMessage());
        }

And if I imply this code I get such request:

    POST test/_search
{
  "multi_match" : {
    "query" : "1 254898",
    "fields" : [
      "*^1.0"
    ],
    "type" : "most_fields",
    "operator" : "OR",
    "slop" : 0,
    "prefix_length" : 0,
    "max_expansions" : 50,
    "zero_terms_query" : "NONE",
    "auto_generate_synonyms_phrase_query" : true,
    "fuzzy_transpositions" : true,
    "boost" : 1.0
  }
}

I got such error:

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "Unknown key for a START_OBJECT in [multi_match].",
        "line": 2,
        "col": 19
      }
    ],
    "type": "parsing_exception",
    "reason": "Unknown key for a START_OBJECT in [multi_match].",
    "line": 2,
    "col": 19
  },
  "status": 400
}

After changing query to:

   POST test/_search
    {
      "query": { 
      "multi_match" : {
        "query" : "1 254898",
        "fields" : [
          "recipient.document.number^1.0",
          "recipient.document.type^1.0"
        ],
        "type" : "most_fields",
        "operator" : "OR",
        "slop" : 0,
        "prefix_length" : 0,
        "max_expansions" : 50,
        "zero_terms_query" : "NONE",
        "auto_generate_synonyms_phrase_query" : true,
        "fuzzy_transpositions" : true,
        "boost" : 1.0
      }
      }
    }

I get such error:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: {\n  \"multi_match\" : {\n    \"query\" : \"1 254898\",\n    \"fields\" : [\n      \"*^1.0\"\n    ],\n    \"type\" : \"most_fields\",\n    \"operator\" : \"OR\",\n    \"slop\" : 0,\n    \"prefix_length\" : 0,\n    \"max_expansions\" : 50,\n    \"zero_terms_query\" : \"NONE\",\n    \"auto_generate_synonyms_phrase_query\" : true,\n    \"fuzzy_transpositions\" : true,\n    \"boost\" : 1.0\n  }\n}",
        "index_uuid": "kZAjfQWHQ6SkNIETBxhfOA",
        "index": "portal-smm-recipient"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "portal-smm-recipient",
        "node": "JEwfXRPHQSWjPn4L4U6AVQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"multi_match\" : {\n    \"query\" : \"1 254898\",\n    \"fields\" : [\n      \"*^1.0\"\n    ],\n    \"type\" : \"most_fields\",\n    \"operator\" : \"OR\",\n    \"slop\" : 0,\n    \"prefix_length\" : 0,\n    \"max_expansions\" : 50,\n    \"zero_terms_query\" : \"NONE\",\n    \"auto_generate_synonyms_phrase_query\" : true,\n    \"fuzzy_transpositions\" : true,\n    \"boost\" : 1.0\n  }\n}",
          "index_uuid": "kZAjfQWHQ6SkNIETBxhfOA",
          "index": "portal-smm-recipient",
          "caused_by": {
            "type": "number_format_exception",
            "reason": "For input string: \"1 254898\""
          }
        }
      }
    ]
  },
  "status": 400
}

What I'm doing wrong?

Which version of es is this?

As I mentioned before if some of your fields are indexed as numeric fields (not like in your original example where all fields are text) this approach is not going to work. You need to reindex numeric fields as text fields and exclude numeric fields from search or you need to ignore them by adding "lenient": "true", as in my example.

I did it by this code:
QueryBuilder qb = multiMatchQuery(searchParameters, "*").type(MOST_FIELDS).lenient(true);

And put 5 documents in one index:

PUT test/doc/1
{
    "name": "Smith",
    "surname": "19",
    "age": "Jhon"
    
}

PUT test/doc/2
{
    "name": "Long",
    "surname": "Jhon",
    "age": "19"
    
}


PUT test/doc/3
{
    "name": "Smith",
    "surname": "Jhon",
    "age": "19"
    
}

PUT test/doc/4
{
    "name": "Jhon",
    "surname": "Smith",
    "age": "19"
    
}

PUT test/doc/5
{
    "name": "1",
    "surname": "Jhon",
    "age": "19"
    
}

run a query:

POST test/_search
{
  "query": { 
  
  "multi_match" : {
    "query" : "Jhon Smith",
    "fields" : [
      "*^1.0"
    ],
    "type" : "most_fields",
    "operator" : "OR",
    "slop" : 0,
    "prefix_length" : 0,
    "max_expansions" : 50,
    "lenient" : true,
    "zero_terms_query" : "NONE",
    "auto_generate_synonyms_phrase_query" : true,
    "fuzzy_transpositions" : true,
    "boost" : 1.0
  }
}
}

and get result:

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "4",
        "_score": 1.3862944,
        "_source": {
          "name": "Jhon",
          "surname": "Smith",
          "age": "19"
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "2",
        "_score": 0.6931472,
        "_source": {
          "name": "Long",
          "surname": "Jhon",
          "age": "19"
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "name": "Smith",
          "surname": "Jhon",
          "age": "19"
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "1",
        "_score": 0.36464313,
        "_source": {
          "name": "Smith",
          "surname": "19",
          "age": "Jhon"
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "5",
        "_score": 0.2876821,
        "_source": {
          "name": "1",
          "surname": "Jhon",
          "age": "19"
        }
      }
    ]
  }
}

The score is different. I create my logic according to score and need the same score if Jhon Smith was found no matter in what fields and in what order. So, the score should be the same and the highest if Jhon and Smith was found. How can I do it?

Which version of elasticsearch is this?

6.4.1

Solve my task by this one:

QueryBuilder qb = multiMatchQuery(searchParameters, "*").type(CROSS_FIELDS).operator(Operator.AND).
                lenient(true);

query in Elastic:

    {
      "multi_match" : {
        "query" : "1 19",
        "fields" : [
          "*^1.0"
        ],
        "type" : "cross_fields",
        "operator" : "AND",
        "slop" : 0,
        "prefix_length" : 0,
        "max_expansions" : 50,
        "lenient" : true,
        "zero_terms_query" : "NONE",
        "auto_generate_synonyms_phrase_query" : true,
        "fuzzy_transpositions" : true,
        "boost" : 1.0
      }
    }

but I have a template for my data:

PUT _template/test
{
  "index_patterns": "test",
  "settings": { "number_of_shards": 5,
    "analysis": {
      "normalizer": {
        "useLowercase": {
          "type": "custom",
          "filter": [ "lowercase" ]
        }
      }
    
  }
  },
  "mappings": {
      "test": {
        "properties": {
          "name": {"normalizer":"useLowercase",
          "type": "keyword"},
      "surname": {"normalizer":"useLowercase",
          "type": "keyword"},
          "age": {
          "type": "long"}
    }
  }
}
}

I use normalizer [lowercase] for searching independetly from register. But I can't use it with multi match - can't find anything.

Have this data in Elastic:

PUT test/test/1
{
    "name": "Smith",
    "surname": "19",
    "age": 19
    
}

PUT test/test/2
{
    "name": "Long",
    "surname": "Jhon",
    "age": 19
    
}


PUT test/test/3
{
    "name": "Smith",
    "surname": "Jhon",
    "age": 19
    
}

PUT test/test/4
{
    "name": "Jhon",
    "surname": "Smith",
    "age": 19
    
}

PUT test/test/5
{
    "name": "1",
    "surname": "Jhon",
    "age": 19
    
}

and using query which I wrote upper, I can't find anything because of normalizer. What I have to add or change for such case? I want to search independetly from register and if 1 and 19 is in the document no metter in what fields.

I see. That gets way too complicated comparing to original question then. I think it would be easier for you to just copy all fields into a single text field using copy_to.

As I understood I have to use copy_to during creating index:

PUT test
{
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "keyword",
          "copy_to": "full_name" 
        },
        "surname": {
          "type": "keyword",
          "copy_to": "full_name" 
        },
        "age": {
          "type": "integer"
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

But such mapping I also do during creating templete.

PUT _template/test
{
  "index_patterns": "test",
  "settings": { "number_of_shards": 5
    
  },
  "mappings": {
      "test": {
        "properties": {
          "name": {"type": "keyword"},
      "surname": {"type": "keyword"},
          "age": {"type": "long"}
    }
  }
}
}

Will it cross in some way? And with such realization I can't find any documents by field full_name in the index.

I don't understand what's the issue here. If you are creating index explicitly, you don't need the template. If you are creating the index with template, you need to specify copy_to in the template.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.