ElasticSearch with Java Bulk API Problem with indexation


(MichaƂ Zawadzki) #1

Hi,
I'm using ElasticSearch 5.2.2 with Java 8. I'm running Elastic locally with default settings on Windows 10. In my mapping I had defined two types: vod and episode which are connected with the parent-child relation:

{

"settings":{
"index":{
"analysis":{
"filter":{
"ngram_filter":{
"type":"nGram",
"min_gram":1,
"max_gram":20,
"token_chars":[
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer":{
"autosuggest":{
"tokenizer":"autosuggest",
"filter":"lowercase",
"term_vector":"with_positions_offsets"
},
"ngram_analyzer":{
"type":"custom",
"tokenizer":"whitespace",
"filter":[
"lowercase",
"asciifolding",
"ngram_filter"
]
},
"whitespace_analyzer":{
"type":"custom",
"tokenizer":"whitespace",
"filter":[
"lowercase",
"asciifolding"
]
}
},
"tokenizer":{
"autosuggest":{
"type":"edgeNGram",
"min_gram":"1",
"max_gram":"255",
"token_chars":[
"letter",
"digit",
"punctuation"
]
}
}
}
}
},
"mappings":{
"vod":{
"properties":{
"title":{
"type":"string",
"analyzer":"autosuggest"
},
"description":{
"analyzer":"ngram_analyzer",
"search_analyzer":"whitespace_analyzer",
"type":"string"
},
"actors":{
"properties":{
"name":{
"type":"string",
"analyzer":"autosuggest"
}
}
},
"directors":{
"properties":{
"name":{
"type":"string",
"analyzer":"autosuggest"
}
}
},
"scriptwriters":{
"properties":{
"name":{
"type":"string",
"analyzer":"autosuggest"
}
}
},
"schedules":{
"properties":{
"items": {
"type": "nested",
"properties":{
"since":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
},
"till":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
}
},
"epg":{
"properties": {
"title":{
"type":"string",
"analyzer":"autosuggest"
},
"description":{
"analyzer":"ngram_analyzer",
"search_analyzer":"whitespace_analyzer",
"type":"string"
},
"since":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
},
"till":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
},
"live":{
"properties":{
"schedules":{
"properties":{
"items":{
"type": "nested",
"properties":{
"since":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
},
"till":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
}
}
}
}
}
}
}
}

I perform indexation using BulkApi in batches for 200 elements:

	@TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
private void executeIndexVod(List<Vod> vods) {
	BulkRequestBuilder bulkRequest = elasticSearchClient.getClient().prepareBulk();
	ObjectMapper mapper = new ObjectMapper();

	for (Vod vod : vods) {
		try {
			String serializedToJson = mapper.writerFor(VodDto.class).writeValueAsString(
					VodSerializerRemote.serialize(vod, Task.BASE, Task.MODIFIED_AT, Task.LAST_INDEXED_AT, Task.EXTERNAL_UIDS,
							Task.DESCRIPTION, Task.PLATFORMS, Task.COVERS, Task.PERSONS, Task.MAIN_CATEGORY, Task.CATEGORIES,
							Task.TAG_SLUGS, Task.DISPLAY_SCHEDULES_WITH_PLATFORMS, Task.LOGO, Task.HIDDEN));
			bulkRequest.add(elasticSearchClient.getClient().prepareIndex(SearchServiceLocal.INDEX_NAME,
					SearchServiceLocal.PRODUCT_VOD_INDEX_TYPE, Integer.toString(vod.getId())).setSource(serializedToJson));
		} catch (JsonProcessingException e) {
			logger.error("[ELASTIC] Error serializing vod for elastic, id: {}, {}", vod.getId(), e.getLocalizedMessage());
		}
	}

	bulkRequest.execute().actionGet();
}

After each bulk request I perform: curl -X GET localhost:9200/product/vod/_count. For first few batches, everything works just fine. My counter in java, and docs counter are exactly the same. After about 2000 elements successfully indexed, some problems start to occur. In result, only about 7600/7700 vods and 60000/65000 episodes are indexed.

Could you please tell me what could be the reason and what is the possible solution to my problem.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.