Hi,
I have a requirement to perform a computation using the total document count and term vector statistics of the document terms.
For this i am using
- The COUNT API (https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/count.html) for getting the total number of documents in the elastic search type of the elastic search index.
- The Term vector API (https://www.elastic.co/guide/en/elasticsearch/reference/1.4/docs-termvectors.html) to get the document frequency.
I am using Java client API to access the elastic search server.
I index a document using the java api and use IndexResponse.isCreated() to ascertain that the indexing of the document is completed.
But i am getting outdated/old count and term vector information, if i try to retrieve document count,termvector immediately after indexing the document.
Whereas if i wait for 30 seconds,i get the latest document count and term vector information.
I have created a single index on a single node with a single shard using the following command.
curl -XPUT '127.0.0.1:9200/test' -d '
{
"settings":{
"analysis": {
"analyzer": {
"myAnalyzer1":{
"type": "custom",
"tokenizer" : "PatternAnalyzer"
}
},
"tokenizer" :{
"PatternAnalyzer": {
"type": "pattern",
"pattern": "patternvalue"
}
}
},
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
}'
State of the cluster is shared below.
{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 30,
"active_shards": 30,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 15,
"number_of_pending_tasks": 0
}
Is this expected behavior? If yes, Pls suggest a strategy to get the latest term vector information accurately.
Regards,
Kush