Is allowed to do a upsert with 'script_upsert' from Cascading


(Sebastian Stjerne) #1

I'm trying to upsert a document from Cascading, and then I expect that the script will executed, but It doesn't happens at initial time, but when I try again (the document already exist) the script is executed.

//Setup

	Properties props = new Properties();
	props.setProperty(ConfigurationOptions.ES_NODES, properties.getProperty("es.nodes"));
	props.setProperty(ConfigurationOptions.ES_WRITE_OPERATION, ConfigurationOptions.ES_OPERATION_UPSERT);
	props.setProperty(ConfigurationOptions.ES_UPDATE_SCRIPT, "upsert-test");
	props.setProperty(ConfigurationOptions.ES_UPDATE_SCRIPT_LANG, "groovy");
	props.setProperty(ConfigurationOptions.ES_UPDATE_SCRIPT_PARAMS_JSON, "{  param1 : 1.2  }");

	Hfs inTap = new Hfs(new TextDelimited(false, "\n"), inputPath);
	EsTap outTap = new EsTap("/test/test", Fields.ALL);
	ScrubFunction scrubFunction = new ScrubFunction("id","test1","test2");
	Pipe processPipe = new Each("processPipe", scrubFunction, Fields.RESULTS);

	new Hadoop2MR1FlowConnector(props).connect(inTap, outTap, pagesPipe).complete();

upsert-test.groovy

	import  org.elasticsearch.common.logging.*; 
	ESLogger logger = ESLoggerFactory.getLogger('update-weights');
	logger.info('Entering'); 
	def test1 = ctx._source.'test1'
	def test2 = ctx._source.'test2'
	ctx._source.'test3' = test1 + test2 

The logs when It runs a first time:
[2016-03-10 12:42:56,407][INFO ][cluster.metadata ] [Synch] [test] update_mapping [test] (dynamic)
Result:

		{
		_index: "test",
		_type: "test",
		_id: "1",
		_score: 1,
		_source: {
				test1: 1,
				test2: 2
			}
	}

At second time:
[2016-03-10 12:52:53,833][INFO ][upsert-test ] Entering
Result:

	{
		_index: "test",
		_type: "test",
		_id: "1",
		_score: 1,
		_source: {
				test1: 1,
				test2: 2,
				test3: 3
			}
	}

(Costin Leau) #2

Which matches the behaviour in ES (which is the actual executor of the script). To quote the ES docs:

If the document does not already exist, the contents of the upsert element
will be inserted as a new document. If the document does exist, then the
script will be executed instead:

If the behaviour in ES-Hadoop is curious, try and use the rest client against vanilla ES to see whether it differs.


(Andres) #3

Hi @costin, it seems neither of the two upsert options are working with ES-Hadoop connector (2.2.0).
There is no way to set "upsert" element and 'scripted_upsert': true doesn't have any effect.

Do you know if these features will be available in the near future?


(Costin Leau) #4

Sorry , I don't follow. What two upsert options? As I quoted from the ES doc, if the doc doesn't exist, it will be inserted.
The script is executed only if the document is updated.


(Andres) #5

@costin you're right, if the doc doesn't exist, it will be inserted, and 'scripted_upsert': true feature will be included soon, as you mention in other post. Thanks.


(system) #6