1.3.4/Java : how can I parse a ClusterState object from its XContent representation?

mrec · October 7, 2015, 2:23pm

Newbie here. I want to be able to take a ClusterState retrieved as usual via ClusterAdminClient, serialize it to a human-readable file via toXContent then recreate it at a later date from that file. Serializing is straightforward, but after several hours of Googling and hunting through ES source I can't see a way to deserialize back again; XContentParser looks as if it can parse out weakly-typed maps etc, but not specific types like ClusterState.

Serializing/deserializing the ClusterState's MetaData would also be fine - that's the only part I'm interested in - but since MetaData doesn't even implement ToXContent that looked like a non-starter.

Background/context: I'm trying to build unit tests around an existing system that uses index mapping metadata to drive logic. For obvious reasons I don't want to make live ES calls when running unit tests, so the plan was to use canned data instead. Since real metadata is often huge I'd quite like to be able to reduce or construct minimal mappings to test particular edge cases; I'd also like to be able to see the canned data driving a particular test. This is why I really want human-readability; ClusterState.Builder.writeTo/readFrom work fine but produce unreadable binary soup.

Am I barking up the wrong tree, and/or fundamentally misunderstanding what the XContent format is about? I got the impression that it's an abstraction of serialization formats supporting round-tripping, but as I said, newbie.

jprante · October 7, 2015, 3:18pm

You do not need to parse MetaData object, there is an API.

Instead, you can try something like that, for example to get the settings of indices from the cluster state into a map:

    public static Map<String, String> getSettings(Client client, SettingsFilter settingsFilter, String... index) throws IOException {
        Map<String, String> settings = newHashMap();
        ClusterStateRequestBuilder request = client.admin().cluster().prepareState()
                .setIndices(index);
        ClusterStateResponse response = request.execute().actionGet();
        MetaData metaData = response.getState().metaData();
        if (!metaData.getIndices().isEmpty()) {
            // filter out the settings from the metadata
            for (IndexMetaData indexMetaData : metaData) {
                final XContentBuilder builder = jsonBuilder();
                builder.startObject();
                for (Map.Entry<String, String> entry :
                        settingsFilter.filterSettings(indexMetaData.getSettings()).getAsMap().entrySet()) {
                    builder.field(entry.getKey(), entry.getValue());
                }
                builder.endObject();
                settings.put(indexMetaData.getIndex(), builder.string());
            }
        }
        return settings;
    }

mrec · October 7, 2015, 3:35pm

Hi Jörg,

Thanks for the response, but I'm not sure how this answers the question. Your sample code is extracting metadata from a ClusterStateResponse; the system I'm trying to test already does that. What I need/want is the second method below (see third para in original post for context):

// One-time test data setup against a live ES system
//
public static void writeClusterState(ClusterState cs, File outputFile) throws IOException {
    XContentBuilder xcb = XContentFactory.jsonBuilder().prettyPrint();
    cs.toXContent(xcb, ToXContent.EMPTY_PARAMS);
    try (FileWriter fw = new FileWriter(outputFile)) {
        fw.write(xcb.string());
    }
}

// For subsequent unit tests which DON'T talk to a live ES
//
public static ClusterState readClusterState(File inputFile) throws IOException {
    // here a miracle occurs
}

jprante · October 7, 2015, 4:03pm

Ok, you want to instantiate a ClusterState.

Look at org.elasticsearch.action.admin.cluster.state.TransportClusterStateAction and org.elasticsearch.gateway.local.state.meta.LocalGatewayMetaState

It's something like

byte[] data = Streams.copyToByteArray(new FileInputStream(stateFile));
                        if (data.length == 0) {
                            logger.debug("[_global] no data for [" + stateFile.getAbsolutePath() + "], ignoring...");
                            continue;
                        }

                        XContentParser parser = null;
                        try {
                            parser = XContentHelper.createParser(data, 0, data.length);
                            metaData = MetaData.Builder.fromXContent(parser);
                            highestVersion = version;
                        } finally {
                            if (parser != null) {
                                parser.close();
                            }
                        }

ClusterState.Builder builder = ClusterState.builder(currentState.getClusterName());
...
builder.metaData(metaData);
...
ClusterState state = builder.build();

mrec · October 8, 2015, 2:48pm

That looked much more promising, but I still can't get it to work. I currently have

public static void writeMetaData(MetaData metaData, File file) throws IOException {
    XContentBuilder builder = XContentFactory.jsonBuilder().prettyPrint();
    MetaData.Builder.toXContent(metaData, builder, ToXContent.EMPTY_PARAMS);
    try (FileWriter fw = new FileWriter(file)) {
        fw.write(builder.string());
    }
}

public static MetaData readMetaData(File file) throws IOException {
    byte[] data = Streams.copyToByteArray(file);
    try (XContentParser parser = XContentFactory.xContent(XContentType.JSON).createParser(data)) {
        return MetaData.Builder.fromXContent(parser);
    }
}

but it's consistently throwing in MetaData.Builder.fromXContent. What confuses me about XContent is that what it calls JSON output isn't pure JSON; the capture file starts with e.g.

"meta-data"{
  "version" : 0,
  "uuid" : "_na_",
  "templates" : { },
  etc...

Debugging fromXContent, the preamble parses "meta-data" as a VALUE_STRING, then hits the START_OBJECT and calls MetaData.lookupFactory(currentFieldName) where currentFieldName is still null, which returns a null factory and appears to skip the entire JSON content.

mrec · October 8, 2015, 3:58pm

Ah! I was so close. I just needed to strip the leading unJSONic "meta-data" from the toXContent output, and it re-parses fine.

It's a bit weird that the format doesn't round-trip "as is", but this is eminently liveable. Many thanks for your pointers; I wouldn't have got there without them.