Search not working when adding special characters in query

Umang_Pachaury · September 20, 2024, 11:28am

I have a cluster which stores device logs and I use a java client to fetch data into my application. The device logs contain special characters. So as per the documents the special characters do not get analyzed and I am not able to search certain terms which contain special characters. So I created my custom analyzer so that the terms with special characters get analyzed as per my liking

here's how the string is tokenized

Demo:123 into Demo:123, Demo, 123

This is how want my data to be analyzed. The issue arises when I use the Java api client to fetch the data.
The term queries to fetch the data of Demo:123 does not work.

{
    "size": 300,
    "query": {
        "term": {
            "string_field": {
                "value": "Demo:123",
                "boost": 1.0
            }
        }
    }
}

This gives me null result but the same query works in kibana dev tools and it's not the case that the query of the java api client is malformed or anything because as soon as I remove the special character " : (colon) " the same query works and returns result.
So I don't know why the query sent through java api client returns no result but the query works in kibana dev tools.

Java Version: 8

Java HLRC version: 7.17.6

ES version: 8.11.3

dadoonet · September 20, 2024, 11:53am

Could you share your code?
And also please upgrade your client to use the new Java client using the same version as the cluster. Speaking of this, you should also consider upgrading to 8.15.

Umang_Pachaury · September 20, 2024, 12:10pm

What part of the code you need ?

As far as updating the Java client and Cluster version is concerned that is also in the works.

dadoonet · September 20, 2024, 12:30pm

A full reproduction repository is better but at least let start with some pieces of code...
The search call is the minimum part of the code.

Umang_Pachaury · September 23, 2024, 10:17am

Please find the necessary search code in here I am not analyzing the tokens

indexes = [index1,index2]
startTime = 2024-09-23T10:05:45Z
endTime = 2024-09-23T10:20:45Z
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.sort(sortField,sortingOrder);

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

String[] tokenArray = tokens.split(",");
for (String token : tokenArray){
	boolQueryBuilder.must(QueryBuilders.termQuery("message_field", token));
}

boolQueryBuilder = boolQueryBuilder.must(QueryBuilders.rangeQuery(Timestamp).from(startTime)
				.to(endTime).format(date_optional_time));
				
searchSourceBuilder.query(boolQueryBuilder)
		  .size(size);
searchRequest.indices(indexes.toArray(new String[indexes.size()]))
	     .indicesOptions(IndicesOptions.lenientExpandOpen())
	     .source(searchSourceBuilder)
	     .scroll(new TimeValue(60000));
		
RestHighLevelClient client = getElasticsearchClient();
searchResponse = client.search(searchRequest,RequestOptions.DEFAULT);
closeClient(client);

The rest of the code will remain same for the analysis of the search tokens I use the following method and use the Standard Analyzer

client.indices().analyze(analyzeBuilder)

dadoonet · September 23, 2024, 1:34pm

The search request you shared initially does not match the java code you shared.

        "term": {
            "string_field": {
                "value": "Demo:123",
                "boost": 1.0
            }
        }

Here a single term query on a string_field field.

On the Java code:

A bool query

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

With a list of must clauses applied on message_field field:

for (String token : tokenArray){
	boolQueryBuilder.must(QueryBuilders.termQuery("message_field", token));
}

Plus a must clause with dates:

boolQueryBuilder.must(QueryBuilders.rangeQuery(Timestamp).from(startTime)
				.to(endTime).format(date_optional_time));

So you are comparing oranges with apples which can not help.
So please share the right information and we might find what the problem is.

Have a look at the Elastic Stack and Solutions Help · Forums and Slack | Elastic page. It contains also lot of useful information on how to ask for help.

Umang_Pachaury · September 24, 2024, 9:17am

My apologies
The code was a production one and the query as well but I have sanitized the query and here is the query as per the code produced the search Request is correct I have analyzed
The whole point was that the initial query which I posted was fired from postman then also i am not receiving records.

Please find the sanitized query below

{
  "bool" : {
    "must" : [
      {
        "term" : {
          "message_field" : {
            "value" : "demo:123",
            "boost" : 1.0
          }
        }
      },
      {
        "range" : {
          "Timestamp" : {
            "startTime" : "2024-09-24T08:26:15.403",
            "endTime" : "2024-09-24T08:41:15.403",
            "include_lower" : true,
            "include_upper" : true,
            "format" : "date_optional_time",
            "boost" : 1.0
          }
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}

This query does not fetch me results but the same query when fired from the kibana dev tools fetches me valid results . The catch here is that the "demo:123" string is not analyzed here.

But the query formed when the same search string is analyzed is like this

{
  "bool" : {
    "must" : [
      {
        "term" : {
          "message_field" : {
            "value" : "demo",
            "boost" : 1.0
          }
        }
      },
      {
        "term" : {
          "message_field" : {
            "value" : "123",
            "boost" : 1.0
          }
        }
      },
      {
        "range" : {
          "@timestamp" : {
            "startTime" : "2024-09-24T08:44:02.197",
            "endTime" : "2024-09-24T08:59:02.204",
            "include_lower" : true,
            "include_upper" : true,
            "format" : "date_optional_time",
            "boost" : 1.0
          }
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}

This query fetches me results but not valid ones

I have checked the tokenization from my cluster side the analyze request please find the request below

GET <index>/_analyze
{
  "text":"demo:123",
  "analyzer":"custom-analyzer"
}


{
  "tokens": [
    {
      "token": "demo:123",
      "start_offset": 0,
      "end_offset": 8,
      "type": "word",
      "position": 0,
      "positionLength": 2
    },
    {
      "token": "demo",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "123",
      "start_offset": 5,
      "end_offset": 8,
      "type": "word",
      "position": 1
    }
  ]
}

Please tell me if anything else is needed

dadoonet · September 24, 2024, 12:04pm

Please share also a document which is supposed to match.

But again it's not consistent: Timestamp vs @timestamp.

Please provide a full recreation script as described in Elastic Stack and Solutions Help · Forums and Slack | Elastic.

Topic		Replies	Views
Search using special characters in standard analyzer Elasticsearch	1	295	May 11, 2023
Searching word with special characters Elasticsearch	7	1823	November 4, 2020
Not getting proper result while searching special characters in query string Elasticsearch	1	269	August 13, 2021
Elasticsearch/Kibana query_string with special characters Elasticsearch	14	1844	April 13, 2018
Phrases with special characters Elasticsearch	1	1386	July 6, 2017

Search not working when adding special characters in query

Related topics