Trouble accessing Lucene _source values

I'm building out integration tests for my ES5.5 plugin that I wrote which pulls values from a document and generates a score based on those values. Here's what I have to build out my test cluster:

public void prepESClsuter() throws Exception {
    Random random = new Random();

    // Create a new index
    String mapping = XContentFactory.jsonBuilder().startObject().startObject("type")
            .startObject("properties")
            .startObject("content").field("type", "string").endObject()
            .startObject("someField").field("type", "string").endObject()
            .startObject("score").field("type", "double").endObject()
            .startObject("original_score").field("type", "double").endObject()
            .endObject().endObject().endObject()
            .string();
    assertAcked(prepareCreate("test")
            .addMapping("type", mapping, XContentType.JSON));
    List<IndexRequestBuilder> indexBuilders = new ArrayList<IndexRequestBuilder>();
    // Index 10 records (0..9)
    for (int i = 0; i < CONTENT.length; i++) {
        indexBuilders.add(
                client().prepareIndex("test", "type", Integer.toString(i))
                        .setSource(XContentFactory.jsonBuilder().startObject()
                                .field("content", CONTENT[i])
                                .field("original_score", random.nextDouble())
                                .endObject()));
    }
    // Index a few records with empty content
    for (int i = 0; i < 2; i++) {
        indexBuilders.add(
                client().prepareIndex("test", "type", Integer.toString(i + CONTENT.length))
                        .setSource(XContentFactory.jsonBuilder().startObject()
                                .field("someField", CONTENT[i])
                                .field("original_score", random.nextDouble())
                                .endObject()));
    }


    indexRandom(true, indexBuilders);
    flush("test");

}

The CONTENT variable is a String[] containing certain sentences that I would run a query to match on. Inside the plugin (IMPORTANT: I have designed the 5.5 plugin based off of this new model of plugin building), I need to access these fields, however I'm running into a few issues. It has taken me a lot of time to get to the bottom of this, but here are my findings. The following code inside the runAsDouble() method:

IndexableField source = context.reader().document(currentDocid).getField("_source");

Produces this result:

stored<_source:[7b 22 63 6f 6e 74 65 6e 74 22 3a 22 62 65 61 63 68 20 70 61 72 74 69 65 73 22 2c 22 63 61 6d 70 61 69 67 6e 5f 69 6e 66 6f 72 6d 61 74 69 6f 6e 2e 6f 72 69 67 69 6e 61 6c 5f 7a 73 63 6f 72 65 22 3a 30 2e 34 37 35 36 35 32 33 33 36 31 36 35 39 35 30 34 7d]>

Which if translated from hex, is this value

context.reader().document(currentDocid).getField("_source").binaryValue().utf8ToString();
{"content":"beach parties","campaign_information.original_zscore":0.4756523361659504}

Hey! Excellent, those are the fields that I need. My current issue is, now it's a string, instead of a map, and Lucene/ES doesn't play well with 3rd party libraries like GSON and Jackson, and parsing through documents that will be MUCH larger than this in string format (using the method above) will be a bear to deal with.

I haven't found any methods that easily extract the Key Value pairs from the stored source of these documents, and it's the last piece of the puzzle I need in order for my plugin to work properly. I've also tried using this method to get the values:

context.reader().document(currentDocid).getValues("_source")

and it returns null.

Any suggestions?

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.