Problem with Hebrew search


(OlgaT) #1

Hi,

I have problem to find the documents by Hebrew words.
When I create query I encode the query to UTF-8:

QueryStringQueryBuilder textQueryBuilder = new
QueryStringQueryBuilder(new String(query.getBytes(),"UTF-8");

It finds the documents that contains the Hebrew word, but also finds
other documents that contain other engish word.
For example I search for שלום , but elasticsearch finds also documents
that contain Olga.

What can be a problem?
Thank you,
Olga.


(Shay Banon) #2

Can you gist a sample code that recreates it? I can try and chase it down.

On Wed, Oct 26, 2011 at 4:09 PM, OlgaT tubmano@gmail.com wrote:

Hi,

I have problem to find the documents by Hebrew words.
When I create query I encode the query to UTF-8:

QueryStringQueryBuilder textQueryBuilder = new
QueryStringQueryBuilder(new String(query.getBytes(),"UTF-8");

It finds the documents that contains the Hebrew word, but also finds
other documents that contain other engish word.
For example I search for שלום , but elasticsearch finds also documents
that contain Olga.

What can be a problem?
Thank you,
Olga.


(OlgaT) #3

It appears that we indexed HTML encoded data - apparently it was a
problem.However after we removed encoding, we can't find any words in
Hebrew by Java API, but can find by the following curl command:curl -
XGET http://localhost:9200/default/conversation/_search -d '{"query" :
"שלום"}'
The Java sample code:
String queryString="שלום"; SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(tenantName).setTypes(type)
.setSearchType(SearchType.QUERY_THEN_FETCH).setQuery(queryString);SearchResponse
response = searchRequestBuilder.execute().actionGet();
What can be a problem?
Thanks,Olga
On Oct 26, 4:09 pm, OlgaT tubm...@gmail.com wrote:

Hi,

I have problem to find the documents byHebrewwords.
When I create query I encode the query to UTF-8:

QueryStringQueryBuilder textQueryBuilder = new
QueryStringQueryBuilder(new String(query.getBytes(),"UTF-8");

It finds the documents that contains theHebrewword, but also finds
other documents that contain other engish word.
For example I search for שלום , but elasticsearch finds also documents
that contain Olga.

What can be a problem?
Thank you,
Olga.


(Shay Banon) #4

You need to wrap the query string you pass in a QueryBuilders.queryString
construct. When you pass just a string to the setQuery method, it is
supposed to be a json. Check the failed shards on the SearchResponse you
get back, you will see that all are failed with failing to parse the query.

On Mon, Oct 31, 2011 at 8:52 AM, OlgaT tubmano@gmail.com wrote:

It appears that we indexed HTML encoded data - apparently it was a
problem.However after we removed encoding, we can't find any words in
Hebrew by Java API, but can find by the following curl command:curl -
XGET http://localhost:9200/default/conversation/_search -d '{"query" :
"שלום"}'
The Java sample code:
String queryString="שלום"; SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(tenantName).setTypes(type)

.setSearchType(SearchType.QUERY_THEN_FETCH).setQuery(queryString);SearchResponse
response = searchRequestBuilder.execute().actionGet();
What can be a problem?
Thanks,Olga
On Oct 26, 4:09 pm, OlgaT tubm...@gmail.com wrote:

Hi,

I have problem to find the documents byHebrewwords.
When I create query I encode the query to UTF-8:

QueryStringQueryBuilder textQueryBuilder = new
QueryStringQueryBuilder(new String(query.getBytes(),"UTF-8");

It finds the documents that contains theHebrewword, but also finds
other documents that contain other engish word.
For example I search for שלום , but elasticsearch finds also documents
that contain Olga.

What can be a problem?
Thank you,
Olga.


(OlgaT) #5

Oh, of course I use QueryBuilders.queryString(query), but step before.
Here is the code I use:

QueryStringQueryBuilder textQueryBuilder =
QueryBuilders.queryString(query);
searchParams.setQueryString(new
String(textQueryBuilder.buildAsBytes()));

I think "new String(textQueryBuilder.buildAsBytes())" is the problem.
I should add "UTF-8" charset to creation String.

Thank you a lot.

On Oct 31, 7:14 pm, Shay Banon kim...@gmail.com wrote:

You need to wrap the query string you pass in a QueryBuilders.queryString
construct. When you pass just a string to the setQuery method, it is
supposed to be a json. Check the failed shards on the SearchResponse you
get back, you will see that all are failed with failing to parse the query.

On Mon, Oct 31, 2011 at 8:52 AM, OlgaT tubm...@gmail.com wrote:

It appears that we indexed HTML encoded data - apparently it was a
problem.However after we removed encoding, we can't find any words in
Hebrew by Java API, but can find by the following curl command:curl -
XGEThttp://localhost:9200/default/conversation/_search-d '{"query" :
"שלום"}'
The Java sample code:
String queryString="שלום"; SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(tenantName).setTypes(type)

.setSearchType(SearchType.QUERY_THEN_FETCH).setQuery(queryString);SearchRes ponse
response = searchRequestBuilder.execute().actionGet();
What can be a problem?
Thanks,Olga
On Oct 26, 4:09 pm, OlgaT tubm...@gmail.com wrote:

Hi,

I have problem to find the documents byHebrewwords.
When I create query I encode the query to UTF-8:

QueryStringQueryBuilder textQueryBuilder = new
QueryStringQueryBuilder(new String(query.getBytes(),"UTF-8");

It finds the documents that contains theHebrewword, but also finds
other documents that contain other engish word.
For example I search for שלום , but elasticsearch finds also documents
that contain Olga.

What can be a problem?
Thank you,
Olga.


(Shay Banon) #6

You can also just pass a query builder to the SearchRequestBuilder, it will
work better.

On Tue, Nov 1, 2011 at 8:57 AM, OlgaT tubmano@gmail.com wrote:

Oh, of course I use QueryBuilders.queryString(query), but step before.
Here is the code I use:

QueryStringQueryBuilder textQueryBuilder =
QueryBuilders.queryString(query);
searchParams.setQueryString(new
String(textQueryBuilder.buildAsBytes()));

I think "new String(textQueryBuilder.buildAsBytes())" is the problem.
I should add "UTF-8" charset to creation String.

Thank you a lot.

On Oct 31, 7:14 pm, Shay Banon kim...@gmail.com wrote:

You need to wrap the query string you pass in a QueryBuilders.queryString
construct. When you pass just a string to the setQuery method, it is
supposed to be a json. Check the failed shards on the SearchResponse you
get back, you will see that all are failed with failing to parse the
query.

On Mon, Oct 31, 2011 at 8:52 AM, OlgaT tubm...@gmail.com wrote:

It appears that we indexed HTML encoded data - apparently it was a
problem.However after we removed encoding, we can't find any words in
Hebrew by Java API, but can find by the following curl command:curl -
XGEThttp://localhost:9200/default/conversation/_search-d '{"query" :
"שלום"}'
The Java sample code:
String queryString="שלום"; SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(tenantName).setTypes(type)

.setSearchType(SearchType.QUERY_THEN_FETCH).setQuery(queryString);SearchRes
ponse

response = searchRequestBuilder.execute().actionGet();
What can be a problem?
Thanks,Olga
On Oct 26, 4:09 pm, OlgaT tubm...@gmail.com wrote:

Hi,

I have problem to find the documents byHebrewwords.
When I create query I encode the query to UTF-8:

QueryStringQueryBuilder textQueryBuilder = new
QueryStringQueryBuilder(new String(query.getBytes(),"UTF-8");

It finds the documents that contains theHebrewword, but also finds
other documents that contain other engish word.
For example I search for שלום , but elasticsearch finds also
documents

that contain Olga.

What can be a problem?
Thank you,
Olga.


(system) #7