JDBC River Fails When Mapping Is Used


(Ajay Singh) #1

We are facing a weird issue by using mapping on index. I am creating an
index using this syntax -

PUT /searchindex
{
"index":
{
"analysis": {
"filter": {
"mynGram" : {"type": "nGram", "min_gram": 2, "max_gram":
10}
},
"analyzer": { "myAnalyzer" : {
"type":"custom",
"tokenizer": "standard",
"filter": ["lowercase", "mynGram"]
}
}
}
}
}

After this I am putting this mapping -

PUT /searchindex/searchproducts/_mapping
{
"index_analyzer" : "myAnalyzer",
"search_analyzer" : "standard",
"properties" : {
"_id" : {"type" : "string", "include_in_all": false },
"ProductID" : {"type":"integer" },
"CategoryID": {"type":"integer" },
"ManufacturerID" : {"type":"integer" },
"ProductTitle": {"type": "string", "analyzer":"myAnalyzer"},
"MFName" : {"type": "string", "analyzer":"myAnalyzer"},
"CategoryName" : {"type": "string", "analyzer":"myAnalyzer"},
"TechnicalSpecification" : {"type": "string",
"analyzer":"myAnalyzer"},
"MfgPartNumber" : {"type": "string", "analyzer":"myAnalyzer"},
"MarketingText" : {"type": "string", "analyzer":"myAnalyzer"},
"ImageName" : {"type": "string", "include_in_all": false},
"IsActive" : {"type": "boolean"},
"IsFeatured" : {"type": "boolean"},
"IsInCatalog" : {"type": "boolean"},
"IsMASInsertSync" : {"type": "integer"},
"EDelivery" : {"type": "boolean"},
"ProductStatus" : {"type": "integer"},
"MFLogo" : {"type": "string", "include_in_all": false},
"MPartNumber" : {"type": "string", "include_in_all": false},
"ParentCategory" : {"type": "integer"},
"ContainsAccessory" : {"type": "boolean"},
"ContainsCompatible" : {"type": "boolean"},
"EMfID" : {"type": "integer"},
"ECategoryID" : {"type": "integer"},
"ESmallImage" : {"type": "string", "include_in_all": false},
"ELargeImage" : {"type": "string", "include_in_all": false},
"ProductType" : {"type": "string", "analyzer":"myAnalyzer"},
"ProductName" : {"type": "string", "analyzer":"myAnalyzer"}
}
}

After this I am using JDBC River create by JPrante for fetching data from
SQL Server 2005, following is the script

AllWebSearchProducts

Create River Script
PUT /_river/searchproductsriver/_meta
{
"type" : "jdbc",
"jdbc" : {
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"jdbc:sqlserver://xxx.xx.x.xxx;databaseName=Products",
"user":"adminaccount","password":"password",
"sql":"SELECT ProductID as _id FROM ProductAttributes(nolock)",
"strategy" : "oneshot",
"index" : "searchindex",
"type" : "searchproducts",
"bulk_size" : 500,
"max_retries": 5,
"max_retries_wait":"30s",
"max_bulk_requests" : 30,
"bulk_flush_interval" : "5s"
}
}
Here SQL string is truncated just to keep it simple, I am actually using a long string with all the columns given in mapping and with multiple joins.

This river is supposed to bring around 2.6 million products, but it stops after fetching 1.9 to 2 million products. I have tried
running it many times but it fails. This river runs for around 3hrs 30mnts.

However if I run this river without creating the mapping, it works fine and all products are fetched in Index.

Please let me know if I am doing something wrong. No error is given or logged by ElasticSearch, it simply stops running the river.

Thanks & Regards,
Ajay.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0ee8c762-6fb5-4326-a126-f0565a9dc5cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

From the symptoms you describe, it may be the ngram tokenzation puts more
load on the cluster than it can handle, but without error messages, this
must be a blind guess.

You could play with max_bulk_requests and bulk_size to reduce the number of
docs submitted at once to see if this works better.

Jörg

On Fri, Apr 11, 2014 at 1:40 PM, Ajay Singh ajay0055@gmail.com wrote:

We are facing a weird issue by using mapping on index. I am creating an
index using this syntax -

PUT /searchindex
{
"index":
{
"analysis": {
"filter": {
"mynGram" : {"type": "nGram", "min_gram": 2, "max_gram":
10}
},
"analyzer": { "myAnalyzer" : {
"type":"custom",
"tokenizer": "standard",
"filter": ["lowercase", "mynGram"]
}
}
}
}
}

After this I am putting this mapping -

PUT /searchindex/searchproducts/_mapping
{
"index_analyzer" : "myAnalyzer",
"search_analyzer" : "standard",
"properties" : {
"_id" : {"type" : "string", "include_in_all": false },
"ProductID" : {"type":"integer" },
"CategoryID": {"type":"integer" },
"ManufacturerID" : {"type":"integer" },
"ProductTitle": {"type": "string", "analyzer":"myAnalyzer"},
"MFName" : {"type": "string", "analyzer":"myAnalyzer"},
"CategoryName" : {"type": "string", "analyzer":"myAnalyzer"},
"TechnicalSpecification" : {"type": "string",
"analyzer":"myAnalyzer"},
"MfgPartNumber" : {"type": "string", "analyzer":"myAnalyzer"},
"MarketingText" : {"type": "string", "analyzer":"myAnalyzer"},
"ImageName" : {"type": "string", "include_in_all": false},
"IsActive" : {"type": "boolean"},
"IsFeatured" : {"type": "boolean"},
"IsInCatalog" : {"type": "boolean"},
"IsMASInsertSync" : {"type": "integer"},
"EDelivery" : {"type": "boolean"},
"ProductStatus" : {"type": "integer"},
"MFLogo" : {"type": "string", "include_in_all": false},
"MPartNumber" : {"type": "string", "include_in_all": false},
"ParentCategory" : {"type": "integer"},
"ContainsAccessory" : {"type": "boolean"},
"ContainsCompatible" : {"type": "boolean"},
"EMfID" : {"type": "integer"},
"ECategoryID" : {"type": "integer"},
"ESmallImage" : {"type": "string", "include_in_all": false},
"ELargeImage" : {"type": "string", "include_in_all": false},
"ProductType" : {"type": "string", "analyzer":"myAnalyzer"},
"ProductName" : {"type": "string", "analyzer":"myAnalyzer"}
}
}

After this I am using JDBC River create by JPrante for fetching data
from SQL Server 2005, following is the script

AllWebSearchProducts

Create River Script
PUT /_river/searchproductsriver/_meta
{
"type" : "jdbc",
"jdbc" : {
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"jdbc:sqlserver://xxx.xx.x.xxx;databaseName=Products",
"user":"adminaccount","password":"password",
"sql":"SELECT ProductID as _id FROM ProductAttributes(nolock)",
"strategy" : "oneshot",
"index" : "searchindex",
"type" : "searchproducts",
"bulk_size" : 500,
"max_retries": 5,
"max_retries_wait":"30s",
"max_bulk_requests" : 30,
"bulk_flush_interval" : "5s"
}
}
Here SQL string is truncated just to keep it simple, I am actually using a long string with all the columns given in mapping and with multiple joins.

This river is supposed to bring around 2.6 million products, but it stops after fetching 1.9 to 2 million products. I have tried
running it many times but it fails. This river runs for around 3hrs 30mnts.

However if I run this river without creating the mapping, it works fine and all products are fetched in Index.

Please let me know if I am doing something wrong. No error is given or logged by ElasticSearch, it simply stops running the river.

Thanks & Regards,
Ajay.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0ee8c762-6fb5-4326-a126-f0565a9dc5cd%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0ee8c762-6fb5-4326-a126-f0565a9dc5cd%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGPbxuUtsQ_3hYwkZRUK2NqGurHuj2WH%3DnkyRLbmZVS7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3