Question about :Fs rivers ,synonyms and elasticsearch java api

I use rivers (fs river and jdbc river) to index documents, I use java api
for keyword search, I want to use synonyms in my research,

for example when I type the word Application, documents that contain the
word ios or windows will be in the list of results,

can I load a dictionary at the time of research in which you will find all
synonyms?

i have the following query :

            QueryBuilder query = QueryBuilders.queryString(keyword);

SearchResponse searchHits = esClient.prepareSearch()
.setIndices(INDEX_NAME_DOC).setTypes(INDEX_TYPE_DOC)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFrom(start).setSize(size)
.setQuery(query).addHighlightedField("name")
.addHighlightedField("file")
.execute().actionGet();

how can I modify it to accept synonyms ?

How can i porceed,

Cordially.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

IMHO, to use synonyms you should define it in mapping before indexing.

That way a document containing word will be indexed under microsoft for example.
When searching, Elasticsearch will apply the same analyzer. If you search for word, your search will be converted to microsoft and you will find your doc.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 mars 2013 à 10:02, Ammar Yahia yahia.ammar.info@gmail.com a écrit :

I use rivers (fs river and jdbc river) to index documents, I use java api for keyword search, I want to use synonyms in my research,

for example when I type the word Application, documents that contain the word ios or windows will be in the list of results,

can I load a dictionary at the time of research in which you will find all synonyms?

i have the following query :

            QueryBuilder query = QueryBuilders.queryString(keyword);
  SearchResponse searchHits = esClient.prepareSearch()
  		.setIndices(INDEX_NAME_DOC).setTypes(INDEX_TYPE_DOC)
  		.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
  		.setFrom(start).setSize(size)
  		.setQuery(query).addHighlightedField("name")
  		.addHighlightedField("file")
  		.execute().actionGet();

how can I modify it to accept synonyms ?

How can i porceed,

Cordially.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

thx for the reply, i use fs river to index document, how can I change the
mapping when i creat river to accept synonyms ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You have to create the mapping before creating the river. See: GitHub - dadoonet/fscrawler: Elasticsearch File System Crawler (FS Crawler)
About synonyms, see: Elasticsearch Platform — Find real-time answers at scale | Elastic

Note that the required steps are:
1/ create index with its analyzer
2/ create mapping that will use this analyzer
3/ create the river

HTH

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 mars 2013 à 10:15, Ammar Yahia yahia.ammar.info@gmail.com a écrit :

thx for the reply, i use fs river to index document, how can I change the mapping when i creat river to accept synonyms ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I created an index that contains a simple analyser application =>
Applications :

curl -XPUT 'http://localhost:9200/mydocu/' -d '{
"settings" : {"index" : {"analysis" : {"analyzer" : { "synonym" :
{"tokenizer" : "whitespace","filter" : ["synonym"]} },"filter" : {"synonym"
: {"type" : "synonym", "ignore_case" : true, "synonyms" :
["application => applications"]}}}}}}'

and I used the following mapping :

curl -XPUT 'http://localhost:9200/mydocu/docu/_mapping' -d '
{
"docu" : {
"properties" : {"file" : {"type" : "attachment","path" :
"full","fields" : {"file" : {"type" : "string","store" :
"yes","term_vector" : "with_positions_offsets","index" :
"analyzed","analyzer" : "french"},"author" : {"type" : "string"},"title" :
{"type" : "string","store" : "yes"}, "name" : {"type" : "string"},"date" :
{"type" : "date","format" : "dateOptionalTime"},"keywords" : {"type" :
"string"},"content_type" : {"type" : "string" }}}, "name" : {"type" :
"string","analyzer" : "keyword"},"pathEncoded" : {"type" :
"string","analyzer" : "keyword"}, "postDate" : {"type" : "date","format" :
"dateOptionalTime"},"rootpath" : {"type" : "string","analyzer" :
"keyword"},"virtualpath" : { "type" : "string","analyzer" : "keyword"}}}}'

  • and i create the following river:*

curl -XPUT 'localhost:9200/_river/riverdocu/_meta' -d '{
"type": "fs",
"fs": {
"name": "document river",
"url": "C:\tempDoc",
"update_rate": 180000,
"includes": [ ".doc" , ".xls", ".pdf", ".txt" ]
},
"index": {
"index": "mydocu",
"type": "docu",
}
}'

*but I get this error when the river trying to search for documents :

[2013-03-15 16:01:10,015][DEBUG][action.search.type ] [Arcademan] [1]
Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException:
[Loa][inet[/172.16.10.61:9301]][search/phase/fetch/id]
Caused by: org.elasticsearch.indices.TypeMissingException: [_river]
type[riverdocu] missing: failed to find type loaded for doc [_meta]
at
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
at
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:438)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:634)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
*

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Did you clean the _river before redoing all your test?
Sounds like a _river doc is remaining somewhere in your cluster.

Are you running a multimode cluster or are you in standalone?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 mars 2013 à 16:11, Ammar Yahia yahia.ammar.info@gmail.com a écrit :

I created an index that contains a simple analyser application => Applications :

curl -XPUT 'http://localhost:9200/mydocu/' -d '{
"settings" : {"index" : {"analysis" : {"analyzer" : { "synonym" : {"tokenizer" : "whitespace","filter" : ["synonym"]} },"filter" : {"synonym" : {"type" : "synonym", "ignore_case" : true, "synonyms" : ["application => applications"]}}}}}}'

and I used the following mapping :

curl -XPUT 'http://localhost:9200/mydocu/docu/_mapping' -d '
{
"docu" : {
"properties" : {"file" : {"type" : "attachment","path" : "full","fields" : {"file" : {"type" : "string","store" : "yes","term_vector" : "with_positions_offsets","index" : "analyzed","analyzer" : "french"},"author" : {"type" : "string"},"title" : {"type" : "string","store" : "yes"}, "name" : {"type" : "string"},"date" : {"type" : "date","format" : "dateOptionalTime"},"keywords" : {"type" : "string"},"content_type" : {"type" : "string" }}}, "name" : {"type" : "string","analyzer" : "keyword"},"pathEncoded" : {"type" : "string","analyzer" : "keyword"}, "postDate" : {"type" : "date","format" : "dateOptionalTime"},"rootpath" : {"type" : "string","analyzer" : "keyword"},"virtualpath" : { "type" : "string","analyzer" : "keyword"}}}}'

and i create the following river:

curl -XPUT 'localhost:9200/_river/riverdocu/_meta' -d '{
"type": "fs",
"fs": {
"name": "document river",
"url": "C:\tempDoc",
"update_rate": 180000,
"includes": [ ".doc" , ".xls", ".pdf", ".txt" ]
},
"index": {
"index": "mydocu",
"type": "docu",
}
}'

but I get this error when the river trying to search for documents :

[2013-03-15 16:01:10,015][DEBUG][action.search.type ] [Arcademan] [1] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Loa][inet[/172.16.10.61:9301]][search/phase/fetch/id]
Caused by: org.elasticsearch.indices.TypeMissingException: [_river] type[riverdocu] missing: failed to find type loaded for doc [_meta]
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:438)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:634)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

i'm in standalone, and i use two fs river with name : newriver1 and mydocs
, should i delete those rivers before creating my new river riverdocu ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hmmmm. No. I don't think you have to.
That said if you are running tests, I suggest that you clean every time your environment before running new tests.

I can't say here what this happens.

Are you sending curl commands or doing this from Java?
Are you waiting a little (wait for cluster yellow status for example) after the index creation?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 mars 2013 à 16:28, Ammar Yahia yahia.ammar.info@gmail.com a écrit :

i'm in standalone, and i use two fs river with name : newriver1 and mydocs , should i delete those rivers before creating my new river riverdocu ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

it works, i clean the environment, and now it works, and now i have another
problem :-p

I want when I type the word application, the documents that contain the
word applications will also be returned,

but I have only documents that contain the word application in the result
list,

i have the folowing index with its analyser:

curl -XPUT 'http://localhost:9200/docum/' -d '{
"settings" : {"index" : {"analysis" : {"analyzer" : { "synonym" :
{"tokenizer" : "whitespace","filter" : ["synonym"]} },"filter" : {"synonym"
: {"type" : "synonym", "ignore_case" : true, "synonyms" : ["application =>
applications"]}}}}}}'

and the following mapping:

curl -XPUT 'http://localhost:9200/docum/mydocu/_mapping' -d '
{
"mydocu" : {
"properties" : {
"file" : { "type" : "attachment","path" : "full",
"fields" : {"file" : {"type" : "string","store" :
"yes","term_vector" : "with_positions_offsets","index" : "analyzed"},
"author" : {"type" : "string"},
"title" : {"type" : "string","store" : "yes"},
"name" : {"type" : "string"},
"date" : {"type" : "date","format" : "dateOptionalTime"},
"keywords" : {"type" : "string"},
"content_type" : {"type" : "string" }}},
"name" : {"type" : "string","analyzer" : "keyword"},
"pathEncoded" : {"type" : "string","analyzer" : "keyword"},
"postDate" : {"type" : "date","format" : "dateOptionalTime"},
"rootpath" : {"type" : "string","analyzer" : "keyword"},
"virtualpath" : { "type" : "string","analyzer" : "keyword"}}}}'

is that I made a mistake in the declaration required of the index or
mapping ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I also use this code to search :

QueryBuilder query = QueryBuilders.queryString(keyword);
SearchResponse searchHits = esClient.prepareSearch()
.setIndices(INDEX_NAME).setTypes(INDEX_TYPE_DOC)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFrom(start).setSize(size)
.setQuery(query)
.addHighlightedField("name")
.addHighlightedField("file")
.setHighlighterPreTags("")
.setHighlighterPostTags("
").execute().actionGet();

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.