Proper way to index document with many child documents?


#1

Hey!

I just moved into ES, because I'm builder larger scale application, that simply would get slow because of MySQL in future.

Problem is that, I have no idea how to properly index document, that must be easy to extend on future with multiple child documents on different "categories".

Here is quick example of document structure:

{
	name: 'My campaign',
	user_id: 1,
	settings: {
		'enable_tracking': true,
		'enable_logs': true
	},
	sessions: [
		{
			ip: '127.0.0.1',
			url: 'http://127.0.0.1',
			timestamp: '...',
			session_data: {
				type: 'desktop',
				browser: 'Mozilla Firefox',
				referrer_type: 'direct',
				referrer_url: null
			}
		},
		...
	],
	logs: [
		{action: 'Campaign created', timestamp: '...'},
		...
	]
}

Each campaign could have thousands rows of data on future.
Is it wise idea to index everything inside one document like above or should it be done something like this?

{
	type: 'campaign',
	name: 'My campaign',
	user_id: 1,
	campaign_id: 1
}

{
	type: 'settings',
	'enable_tracking': true,
	'enable_logs': true,
	campaign_id: 1
}

{
	type: 'session',
	ip: '127.0.0.1',
	url: 'http://127.0.0.1',
	timestamp: '...',
	session_data: {
		type: 'desktop',
		browser: 'Mozilla Firefox',
		referrer_type: 'direct',
		referrer_url: null
	},
	campaign_id: 1
}

{
	type: 'log',
	action: 'Campaign created',
	timestamp: '...',
	campaign_id: 1
}

MySQL relational way is the only way I have ever worked with, so I wan't to avoid mistakes that could hurt this project in future.

I want that its easy to way to pull, insert and update separate data on each "categories" without losing performance, could anybody help me a bit point me to right direction?

Thanks!


(Shane Connelly) #2

As a general rule of thumb, Elasticsearch works best by denormalizing the data when possible. So (without any other information), I'd index all the sessions as separate entities in an index. I'd actually consider taking your second example and extending it to embed as much of the "campaign" information as you need in typical queries into the session data as well.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.