What is the exactly content like in a index file in elasticsearch

Daniel_C_S_Yeh · June 4, 2015, 2:33am

Dear all,

I have fully understood the mechanism of indexing a document in elasticsearch, like the example here

https://www.elastic.co/guide/en/elasticsearch/guide/current/inverted-index.html

but one more question is what is the exactly content in the index file?

and how can I open it to see the dictionary table?

Many thanks

nik9000 · June 4, 2015, 1:04pm

While it doesn't answer all of your questions I like this page:
http://lucene.apache.org/core/4_5_0/core/org/apache/lucene/codecs/lucene45/package-summary.html

Jason_Wee · June 4, 2015, 1:59pm

	IndexReader indexReader = DirectoryReader.open(FSDirectory.open(new File("clean/index.termrange")));

	// all fields
	SlowCompositeReaderWrapper.wrap(indexReader).getFieldInfos().forEach(x -> System.out.println(x.name));
	
	Terms terms = SlowCompositeReaderWrapper.wrap(indexReader).terms("contents");
	
	TermsEnum iter = terms.iterator(null);
	
	BytesRef byteRef = null;
    while((byteRef = iter.next()) != null) {
        String term = new String(byteRef.bytes, byteRef.offset, byteRef.length);
        
        System.out.format("%-10s:%2d:%2d %n", term, indexReader.docFreq(new Term("contents", term)), indexReader.totalTermFreq(new Term("contents", term)));
    }
    
	System.out.println(terms.getSumTotalTermFreq());
	System.out.println(terms.getSumDocFreq());
	System.out.println(indexReader.getSumTotalTermFreq("contents"));

lucene index files are binary.. hexdump cannot really print something useful, you need to write code to read the index from the directory. then loop through the term and get the frequency from the index reader object.

hth

jason

colings86 · June 4, 2015, 2:06pm

I haven't used it for a while but Luke used to be a good UI app for inspecting Lucene indices and the seems to support inspecting Elasticsearch indices now too.

Disclaimer: I have not run Luke since it was hosted on Google Code.

IMPORTANT: I would definitely copy the indices to a new folder (outside of you ES directory) and point Luke at that copy to make sure it doesn't corrupt the index somehow.

nik9000 · June 4, 2015, 2:28pm

I don't know if this solution is the "right way to do it in general but it will work. And teach you some things.

You probably want to be careful of that new String call there - I don't think it'll work properly. I'd go with BytesRef.utf8ToString or UnicodeUtil.UTF8toUTF16.

Topic		Replies	Views
How to view file content Elasticsearch	5	162	December 25, 2023
Is the data stored by Lucene in file system readable/is plain text or is encrypted/unreadable? Elasticsearch	3	488	July 23, 2018
Storage in Elastic Search Elasticsearch	3	744	July 5, 2017
Use Elasticsearch to read lucene documents Elasticsearch	2	209	May 5, 2022
What Does The Various Data Files Mean? Elasticsearch	3	682	May 5, 2022

What is the exactly content like in a index file in elasticsearch

Related topics