using the termvectors endpoint, I get two different values for ttf for the same term.
For doc #1, I get:
"death": {
"doc_freq": 2069,
"ttf": 14447,
"term_freq": 42
…
Document #2 in the list is:
"death": {
"doc_freq": 1961,
"ttf": 12227,
"term_freq": 14
…
And for yet another document, I get:
"death": {
"doc_freq": 1989,
"ttf": 12851,
"term_freq": 8,
So, why isn’t ttf the same for every document in a given index?
I'd like to find out if there's a way to get a count for a term within a document and a count for a term within the index, so if termvectors won't do that, is there something that will?
It's possible, there appear to be 5 shards (0, 1, 2, 3 and 4). I do not have any indexes that have not been created this way - I didn't do anything specific to specify the number of shards when I created them, so that must be some kind of default.
Is there a way to get a count across all shards?
When I created an index with 1 shard, then used _reindex to populate it. Now, using the _termvectors interface, it's now telling me that there are 489760 documents when I know that there are only 424,883 documents. So, I guess this is a result of using the _reindex interface? And I guess that the extra 64877 documents are actually "deleted" documents since they total count from doing a "Discover" in Kibana shows the correct number on a search for "*" in the index: 424,883.
Am I now "stuck" with these extra document counts (and term counts)?
Indeed there were deleted documents, even though there weren't any in the index before I reindexed from 5 shards to 1:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open index1 MbWtduzaQnusompMjG0Urg 5 1 424883 0 3.7gb 3.7gb
yellow open index1_v1 t987tbkFSWOeORF5u62SSw 1 1 424883 64877 4.2gb 4.2gb
However, there appears to be 1616 that won't go away with the forcemerge:
POST /index1_v1/_forcemerge?only_expunge_deletes=true
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open index1 MbWtduzaQnusompMjG0Urg 5 1 424883 0 3.7gb 3.7gb
yellow open index1_v1 t987tbkFSWOeORF5u62SSw 1 1 424883 1616 3.7gb 3.7gb
Is there a reason why deleted documents wouldn't be deleted by forcemerge?
Thanks,
David
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.