Tokenizer analysis is not working properly in jdbc river


(samaswin) #1

I used the following query,

{
"type": "jdbc",
"jdbc": {
"strategy": "simple",
"driver": "com.mysql.jdbc.Driver",
"url": "jdbc:mysql://192.168.43.204:3306/BacheCompany",
"user": "turbodbuser",
"password": "turbo@2012",
"sql": "SELECT DISTINCT b.joMasterID as
_id,a.jobStatus,a.joStatusID,b.*,c.Name FROM joMaster AS b LEFT JOIN
joStatus AS a ON b.jobstatus=a.joStatusID LEFT JOIN rxMaster AS c ON
c.rxMasterID=b.rxCustomerID",
"poll": "10s",
"autocommit": true
},
"index": {
"index": "turbo1",
"type": "jobs1",
"bulk_timeout": "60s",
"index_settings": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/prolog/wn_s.pl"
}
}
}
}
}
}

But the tokenizer is not working me. Can anyone suggest me the solution

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Create your index and the mapping first.
Only then, create the river.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 15 oct. 2013 à 07:41, sam aswin samaswin@gmail.com a écrit :

I used the following query,

{
"type": "jdbc",
"jdbc": {
"strategy": "simple",
"driver": "com.mysql.jdbc.Driver",
"url": "jdbc:mysql://192.168.43.204:3306/BacheCompany",
"user": "turbodbuser",
"password": "turbo@2012",
"sql": "SELECT DISTINCT b.joMasterID as _id,a.jobStatus,a.joStatusID,b.*,c.Name FROM joMaster AS b LEFT JOIN joStatus AS a ON b.jobstatus=a.joStatusID LEFT JOIN rxMaster AS c ON c.rxMasterID=b.rxCustomerID",
"poll": "10s",
"autocommit": true
},
"index": {
"index": "turbo1",
"type": "jobs1",
"bulk_timeout": "60s",
"index_settings": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/prolog/wn_s.pl"
}
}
}
}
}
}

But the tokenizer is not working me. Can anyone suggest me the solution

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #3

Thanks David, I understood the concept.
I need to configure the wordnet prolog in elasticsearch.
how can i done that?. The query is,

{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"format" : "wordnet",
"synonyms_path" : "analysis/wn_s.pl"
}
}
}
}
}

But the query still not working. Did I made any mistake

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #4

You should first try to get the analyzer running as expected without launching any river.

So, basically, create a full curl recreation:

  • delete index
  • create index with analyzer
  • use analyze API to check if your analyzer is correct

If not, GIST your curl recreation and we can have a look at it.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 oct. 2013 à 09:51, sam aswin samaswin@gmail.com a écrit :

Thanks David, I understood the concept.
I need to configure the wordnet prolog in elasticsearch.
how can i done that?. The query is,

{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"format" : "wordnet",
"synonyms_path" : "analysis/wn_s.pl"
}
}
}
}
}

But the query still not working. Did I made any mistake

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #5

I created analysis folder in elasticsearch directory. Is this right?

I used the following query,

{
"type": "jdbc",
"jdbc": {
"strategy": "simple",
"driver": "com.mysql.jdbc.Driver",
"url": "jdbc:mysql://192.168.43.204:3306/BacheCompany",
"user": "turbodbuser",
"password": "turbo@2012",
"sql": "SELECT DISTINCT b.joMasterID as
_id,a.jobStatus,a.joStatusID,b.*,c.Name FROM joMaster AS b LEFT JOIN
joStatus AS a ON b.jobstatus=a.joStatusID LEFT JOIN rxMaster AS c ON
c.rxMasterID=b.rxCustomerID",
"poll": "10s",
"autocommit": true
},
"index": {
"index": "turbo1",
"type": "jobs1",
"bulk_timeout": "60s",
"index_settings": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/prolog/wn_s.pl"
}
}
}
}
}
}

But the tokenizer is not working me. Can anyone suggest me the solution

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #6

As far as I can recall it, I think it should be under config directory. (unsure though)

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 15 oct. 2013 à 09:54, sam aswin samaswin@gmail.com a écrit :

I created analysis folder in elasticsearch directory. Is this right?

I used the following query,

{
"type": "jdbc",
"jdbc": {
"strategy": "simple",
"driver": "com.mysql.jdbc.Driver",
"url": "jdbc:mysql://192.168.43.204:3306/BacheCompany",
"user": "turbodbuser",
"password": "turbo@2012",
"sql": "SELECT DISTINCT b.joMasterID as _id,a.jobStatus,a.joStatusID,b.*,c.Name FROM joMaster AS b LEFT JOIN joStatus AS a ON b.jobstatus=a.joStatusID LEFT JOIN rxMaster AS c ON c.rxMasterID=b.rxCustomerID",
"poll": "10s",
"autocommit": true
},
"index": {
"index": "turbo1",
"type": "jobs1",
"bulk_timeout": "60s",
"index_settings": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/prolog/wn_s.pl"
}
}
}
}
}
}

But the tokenizer is not working me. Can anyone suggest me the solution

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #7

Thanks David.
But still the problem is exist. and there is no error in log also.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #8

Then Gist the full curl recreation.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 15 oct. 2013 à 10:08, sam aswin samaswin@gmail.com a écrit :

Thanks David.
But still the problem is exist. and there is no error in log also.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #9

I use RESTClient in firfox,

Index mapping
Put:
http://localhost:9200/projects/
{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"format" : "wordnet",
"synonyms_path" : "analysis/wn_s.pl"
}
}
}
}
}

Put;
http://localhost:9200/projects/project/2
{
"name" : "child"
}

http://localhost:9200/projects/project/2
{
"name" : "baby"
}

Post:
http://localhost:9200/projects/_search?pretty=true
{
"query": {
"query_string": {
"query": "child"
}
}

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #10

What output do you get?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 15 oct. 2013 à 10:18, sam aswin samaswin@gmail.com a écrit :

I use RESTClient in firfox,

Index mapping
Put:
http://localhost:9200/projects/
{
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"format" : "wordnet",
"synonyms_path" : "analysis/wn_s.pl"
}
}
}
}
}

Put;
http://localhost:9200/projects/project/2
{
"name" : "child"
}

http://localhost:9200/projects/project/2
{
"name" : "baby"
}

Post:
http://localhost:9200/projects/_search?pretty=true
{
"query": {
"query_string": {
"query": "child"
}
}

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #11
  1. {
  2. "took": 2,
    
  3. "timed_out": false,
    
  4. "_shards":
    
  5. {
    
  6.     "total": 1,
    
  7.     "successful": 1,
    
  8.     "failed": 0
    
  9. },
    
  10. "hits":
    
  11. {
    
  12.     "total": 1,
    
  13.     "max_score": 1,
    
  14.     "hits":
    
  15.     [
    
  16.         {
    
  17.             "_index": "projects",
    
  18.             "_type": "project",
    
  19.             "_id": "3",
    
  20.             "_score": 1,
    
  21.             "_source":
    
  22.             {
    
  23.                 "name": "kid"
    
  24.             }
    
  25.         }
    
  26.     ]
    
  27. }
    
  28. }

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #12
  1. {
  2. "took": 2,
    
  3. "timed_out": false,
    
  4. "_shards":
    
  5. {
    
  6.     "total": 1,
    
  7.     "successful": 1,
    
  8.     "failed": 0
    
  9. },
    
  10. "hits":
    
  11. {
    
  12.     "total": 1,
    
  13.     "max_score": 1,
    
  14.     "hits":
    
  15.     [
    
  16.         {
    
  17.             "_index": "projects",
    
  18.             "_type": "project",
    
  19.             "_id": "3",
    
  20.             "_score": 1,
    
  21.             "_source":
    
  22.             {
    
  23.                 "name": "child"
    
  24.             }
    
  25.         }
    
  26.     ]
    
  27. }
    
  28. }

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #13

So baby and child are synonyms in your file, right?
And you expect 2 results right?

In your example you set the same id for both docs. So you replaced the old one.

I think it's a typo but unsure.

It will help a lot if you Gist a full curl recreation (with synonym file as well).

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 15 oct. 2013 à 10:48, sam aswin samaswin@gmail.com a écrit :

{
"took": 2,
"timed_out": false,
"_shards":
{
"total": 1,
"successful": 1,
"failed": 0
},
"hits":
{
"total": 1,
"max_score": 1,
"hits":
[
{
"_index": "projects",
"_type": "project",
"_id": "3",
"_score": 1,
"_source":
{
"name": "child"
}
}
]
}
}

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #14

Yes I need the two results. But above is the type mistake. If I search
GET:
http://localhost:9200/projects/_search?pretty=true

The results:

  1. {
  2. "took": 1,
    
  3. "timed_out": false,
    
  4. "_shards":
    
  5. {
    
  6.     "total": 1,
    
  7.     "successful": 1,
    
  8.     "failed": 0
    
  9. },
    
  10. "hits":
    
  11. {
    
  12.     "total": 2,
    
  13.     "max_score": 1,
    
  14.     "hits":
    
  15.     [
    
  16.         {
    
  17.             "_index": "projects",
    
  18.             "_type": "project",
    
  19.             "_id": "1",
    
  20.             "_score": 1,
    
  21.             "_source":
    
  22.             {
    
  23.                 "name": "child"
    
  24.             }
    
  25.         },
    
  26.         {
    
  27.             "_index": "projects",
    
  28.             "_type": "project",
    
  29.             "_id": "3",
    
  30.             "_score": 1,
    
  31.             "_source":
    
  32.             {
    
  33.                 "name": "kid"
    
  34.             }
    
  35.         }
    
  36.     ]
    
  37. }
    
  38. }

Above are the typo mistakes. I add it for reference.

On Tuesday, 15 October 2013 14:24:01 UTC+5:30, David Pilato wrote:

So baby and child are synonyms in your file, right?
And you expect 2 results right?

In your example you set the same id for both docs. So you replaced the old
one.

I think it's a typo but unsure.

It will help a lot if you Gist a full curl recreation (with synonym file
as well).

--

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #15

I removed the entire index and recreated. Still i got the same problem.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #16

PUT:

http://localhost:9200/projects/

{
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/wn_s.pl"
}
}
}
}
}

PUT:
http://localhost:9200/projects/project/1

{
"name" : "kid"
}

http://localhost:9200/projects/project/2

{
"name" : "child"
}

POST:
http://localhost:9200/projects/_search?pretty=true
{
"query": {
"query_string": {
"query": "child"
}
}
}

analysis folder:
/usr/share/elasticsearch/config/analysis/

Wordnet prolog file is:
/usr/share/elasticsearch/config/analysis/wn_s.pl

Wordnet version is 3.0

RESULT:

  1. {
  2. "took": 1,
    
  3. "timed_out": false,
    
  4. "_shards":
    
  5. {
    
  6.     "total": 1,
    
  7.     "successful": 1,
    
  8.     "failed": 0
    
  9. },
    
  10. "hits":
    
  11. {
    
  12.     "total": 1,
    
  13.     "max_score": 1,
    
  14.     "hits":
    
  15.     [
    
  16.         {
    
  17.             "_index": "projects",
    
  18.             "_type": "project",
    
  19.             "_id": "2",
    
  20.             "_score": 1,
    
  21.             "_source":
    
  22.             {
    
  23.                 "name": "child"
    
  24.             }
    
  25.         }
    
  26.     ]
    
  27. }
    
  28. }

Expected Result:

kid and child should be display.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #17

You forgot the mapping for the field "name", compare your commands to

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samaswin) #18

Thank Jorg,
I solved my problem. Thanks David for your help

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #19