Question about :Fs rivers ,synonyms and elasticsearch java api

yahia · March 15, 2013, 9:02am

I use rivers (fs river and jdbc river) to index documents, I use java api
for keyword search, I want to use synonyms in my research,

for example when I type the word Application, documents that contain the
word ios or windows will be in the list of results,

can I load a dictionary at the time of research in which you will find all
synonyms?

i have the following query :

            QueryBuilder query = QueryBuilders.queryString(keyword);

SearchResponse searchHits = esClient.prepareSearch()
.setIndices(INDEX_NAME_DOC).setTypes(INDEX_TYPE_DOC)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFrom(start).setSize(size)
.setQuery(query).addHighlightedField("name")
.addHighlightedField("file")
.execute().actionGet();

how can I modify it to accept synonyms ?

How can i porceed,

Cordially.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · March 15, 2013, 9:05am

IMHO, to use synonyms you should define it in mapping before indexing.

That way a document containing word will be indexed under microsoft for example.
When searching, Elasticsearch will apply the same analyzer. If you search for word, your search will be converted to microsoft and you will find your doc.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 mars 2013 à 10:02, Ammar Yahia yahia.ammar.info@gmail.com a écrit :

I use rivers (fs river and jdbc river) to index documents, I use java api for keyword search, I want to use synonyms in my research,

for example when I type the word Application, documents that contain the word ios or windows will be in the list of results,

can I load a dictionary at the time of research in which you will find all synonyms?

i have the following query :
            QueryBuilder query = QueryBuilders.queryString(keyword);
  SearchResponse searchHits = esClient.prepareSearch()
  		.setIndices(INDEX_NAME_DOC).setTypes(INDEX_TYPE_DOC)
  		.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
  		.setFrom(start).setSize(size)
  		.setQuery(query).addHighlightedField("name")
  		.addHighlightedField("file")
  		.execute().actionGet();
how can I modify it to accept synonyms ?

How can i porceed,

Cordially.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

yahia · March 15, 2013, 9:15am

thx for the reply, i use fs river to index document, how can I change the
mapping when i creat river to accept synonyms ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · March 15, 2013, 9:36am

You have to create the mapping before creating the river. See: GitHub - dadoonet/fscrawler: Elasticsearch File System Crawler (FS Crawler)
About synonyms, see: Elasticsearch Platform — Find real-time answers at scale | Elastic

Note that the required steps are:
1/ create index with its analyzer
2/ create mapping that will use this analyzer
3/ create the river

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 mars 2013 à 10:15, Ammar Yahia yahia.ammar.info@gmail.com a écrit :

thx for the reply, i use fs river to index document, how can I change the mapping when i creat river to accept synonyms ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

yahia · March 15, 2013, 3:11pm

I created an index that contains a simple analyser application =>
Applications :

curl -XPUT 'http://localhost:9200/mydocu/' -d '{
"settings" : {"index" : {"analysis" : {"analyzer" : { "synonym" :
{"tokenizer" : "whitespace","filter" : ["synonym"]} },"filter" : {"synonym"
: {"type" : "synonym", "ignore_case" : true, "synonyms" :
["application => applications"]}}}}}}'

and I used the following mapping :

curl -XPUT 'http://localhost:9200/mydocu/docu/_mapping' -d '
{
"docu" : {
"properties" : {"file" : {"type" : "attachment","path" :
"full","fields" : {"file" : {"type" : "string","store" :
"yes","term_vector" : "with_positions_offsets","index" :
"analyzed","analyzer" : "french"},"author" : {"type" : "string"},"title" :
{"type" : "string","store" : "yes"}, "name" : {"type" : "string"},"date" :
{"type" : "date","format" : "dateOptionalTime"},"keywords" : {"type" :
"string"},"content_type" : {"type" : "string" }}}, "name" : {"type" :
"string","analyzer" : "keyword"},"pathEncoded" : {"type" :
"string","analyzer" : "keyword"}, "postDate" : {"type" : "date","format" :
"dateOptionalTime"},"rootpath" : {"type" : "string","analyzer" :
"keyword"},"virtualpath" : { "type" : "string","analyzer" : "keyword"}}}}'

and i create the following river:*

curl -XPUT 'localhost:9200/_river/riverdocu/_meta' -d '{
"type": "fs",
"fs": {
"name": "document river",
"url": "C:\tempDoc",
"update_rate": 180000,
"includes": [ ".doc" , ".xls", ".pdf", ".txt" ]
},
"index": {
"index": "mydocu",
"type": "docu",
}
}'

*but I get this error when the river trying to search for documents :

[2013-03-15 16:01:10,015][DEBUG][action.search.type ] [Arcademan] [1]
Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException:
[Loa][inet[/172.16.10.61:9301]][search/phase/fetch/id]
Caused by: org.elasticsearch.indices.TypeMissingException: [_river]
type[riverdocu] missing: failed to find type loaded for doc [_meta]
at
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
at
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:438)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:634)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
*

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · March 15, 2013, 3:20pm

Did you clean the _river before redoing all your test?
Sounds like a _river doc is remaining somewhere in your cluster.

Are you running a multimode cluster or are you in standalone?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 mars 2013 à 16:11, Ammar Yahia yahia.ammar.info@gmail.com a écrit :

I created an index that contains a simple analyser application => Applications :

curl -XPUT 'http://localhost:9200/mydocu/' -d '{
"settings" : {"index" : {"analysis" : {"analyzer" : { "synonym" : {"tokenizer" : "whitespace","filter" : ["synonym"]} },"filter" : {"synonym" : {"type" : "synonym", "ignore_case" : true, "synonyms" : ["application => applications"]}}}}}}'

and I used the following mapping :

curl -XPUT 'http://localhost:9200/mydocu/docu/_mapping' -d '
{
"docu" : {
"properties" : {"file" : {"type" : "attachment","path" : "full","fields" : {"file" : {"type" : "string","store" : "yes","term_vector" : "with_positions_offsets","index" : "analyzed","analyzer" : "french"},"author" : {"type" : "string"},"title" : {"type" : "string","store" : "yes"}, "name" : {"type" : "string"},"date" : {"type" : "date","format" : "dateOptionalTime"},"keywords" : {"type" : "string"},"content_type" : {"type" : "string" }}}, "name" : {"type" : "string","analyzer" : "keyword"},"pathEncoded" : {"type" : "string","analyzer" : "keyword"}, "postDate" : {"type" : "date","format" : "dateOptionalTime"},"rootpath" : {"type" : "string","analyzer" : "keyword"},"virtualpath" : { "type" : "string","analyzer" : "keyword"}}}}'

and i create the following river:

curl -XPUT 'localhost:9200/_river/riverdocu/_meta' -d '{
"type": "fs",
"fs": {
"name": "document river",
"url": "C:\tempDoc",
"update_rate": 180000,
"includes": [ ".doc" , ".xls", ".pdf", ".txt" ]
},
"index": {
"index": "mydocu",
"type": "docu",
}
}'

but I get this error when the river trying to search for documents :

[2013-03-15 16:01:10,015][DEBUG][action.search.type ] [Arcademan] [1] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Loa][inet[/172.16.10.61:9301]][search/phase/fetch/id]
Caused by: org.elasticsearch.indices.TypeMissingException: [_river] type[riverdocu] missing: failed to find type loaded for doc [_meta]
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:438)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:634)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

yahia · March 15, 2013, 3:28pm

i'm in standalone, and i use two fs river with name : newriver1 and mydocs
, should i delete those rivers before creating my new river riverdocu ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · March 15, 2013, 3:54pm

Hmmmm. No. I don't think you have to.
That said if you are running tests, I suggest that you clean every time your environment before running new tests.

I can't say here what this happens.

Are you sending curl commands or doing this from Java?
Are you waiting a little (wait for cluster yellow status for example) after the index creation?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 mars 2013 à 16:28, Ammar Yahia yahia.ammar.info@gmail.com a écrit :

i'm in standalone, and i use two fs river with name : newriver1 and mydocs , should i delete those rivers before creating my new river riverdocu ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

yahia · March 15, 2013, 4:52pm

it works, i clean the environment, and now it works, and now i have another
problem :-p

I want when I type the word application, the documents that contain the
word applications will also be returned,

but I have only documents that contain the word application in the result
list,

i have the folowing index with its analyser:

curl -XPUT 'http://localhost:9200/docum/' -d '{
"settings" : {"index" : {"analysis" : {"analyzer" : { "synonym" :
{"tokenizer" : "whitespace","filter" : ["synonym"]} },"filter" : {"synonym"
: {"type" : "synonym", "ignore_case" : true, "synonyms" : ["application =>
applications"]}}}}}}'

and the following mapping:

curl -XPUT 'http://localhost:9200/docum/mydocu/_mapping' -d '
{
"mydocu" : {
"properties" : {
"file" : { "type" : "attachment","path" : "full",
"fields" : {"file" : {"type" : "string","store" :
"yes","term_vector" : "with_positions_offsets","index" : "analyzed"},
"author" : {"type" : "string"},
"title" : {"type" : "string","store" : "yes"},
"name" : {"type" : "string"},
"date" : {"type" : "date","format" : "dateOptionalTime"},
"keywords" : {"type" : "string"},
"content_type" : {"type" : "string" }}},
"name" : {"type" : "string","analyzer" : "keyword"},
"pathEncoded" : {"type" : "string","analyzer" : "keyword"},
"postDate" : {"type" : "date","format" : "dateOptionalTime"},
"rootpath" : {"type" : "string","analyzer" : "keyword"},
"virtualpath" : { "type" : "string","analyzer" : "keyword"}}}}'

is that I made a mistake in the declaration required of the index or
mapping ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

yahia · March 15, 2013, 4:56pm

I also use this code to search :

QueryBuilder query = QueryBuilders.queryString(keyword);
SearchResponse searchHits = esClient.prepareSearch()
.setIndices(INDEX_NAME).setTypes(INDEX_TYPE_DOC)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFrom(start).setSize(size)
.setQuery(query)
.addHighlightedField("name")
.addHighlightedField("file")
.setHighlighterPreTags("")
.setHighlighterPostTags("").execute().actionGet();

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Help with Synonyms Elasticsearch	6	513	July 6, 2017
Synonyms in a query Elasticsearch	7	1398	July 6, 2017
Synonym multi words search Elasticsearch	7	584	July 6, 2017
Using the synonyms while using the _search "method" Elasticsearch	7	505	July 6, 2017
Synonym filter not working query time? Elasticsearch	6	1272	July 6, 2017

Question about :Fs rivers ,synonyms and elasticsearch java api

Related topics