Is elastic search the right technology for this task?


#1

I have a project where I have JSON documents which are like social media posts. They have tags, title, author, medias, location, etc. This is a concrete example of how the document could look like:

    {
		"id": "5af825e5-6a0e-46be-8680-3903018e3986",
		"caption": "This is a caption of a #sample #document.",
		"countComments": 0,
		"creationTime": 1530996262,
		"creator": 
		{
			"id": "46618d53-aec7-40d3-9d9f-3671e7de48d7",
			"followerCount": 213,
			"followsCount": 21,
			"lastUpdated": 1531082958,
			"mediaCount": 12,
			"profilePictureUrl": "https://url-to-profile-pic.jpg",,
			"url": "https://www.url.com/users/userXYZ/",
			"username": "userXYZ"
		},
		"downvotes": null,
		"location": 
		{
			labels: [ "city", "country"],
			lat: 1.111111,
			lon: 2.222222,
			title: "location title"
		}
		"medias": [
			{
				"creationTime": 1530996262,
				"height": 640,
				"width": 640,
				"id": "7008863b-19d6-4a70-b3ec-19788e47aa4b",
				"score": 121,
				"thumbnailUrl": "https://url.com/media/thumbnail.jpg",
				"type": "IMG",
				"url": "https://url.com/media/mediaurl.jpg"
			}
		],
		"score": 1224,
		"comments": [],
		"tags": [
			"tagOne",
			"tagTwo",
			"tagThree",
			"tagFour",
			"tagFive",
			"tagSix"
		],
		"title": "title of the post",
		"upvotes": 217,
		"url": "https://www.url.com/posts/thispost"
	}

I am now looking for a document store with which I can run complex queries (meaning on arrays and number ranges for example) in a fast response time. This is a sample query which I want to execute:

    SELECT documents FROM store WHERE
		creationTime IS BETWEEN numberA and numberB
		AND likes IS BETWEEN numberC and numberD
		AND author.follower IS BETWEEN numberE and numberF
		AND location.labels CONTAINS "city"
		AND tags CONTAINS "tagOne" AND "tagTwo"
		LIMIT 16 OFFSET 0

I am well aware of the need of indexing the documents. The arrays are only strings, not complex (nested) objects.

The number of documents will potentially grow into the billions, so sharding the indexes and horizontal scaling is a must have. Optimally I want a response of under 5 seconds for a query like the above (which means big clusters).

Is elasticsearch capable or even an option for that task?


(ddorian43) #2

Can you reduce location.labels into 1 array and not nested ? If yes, then you're good to go.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html for tuning for search.


#3

Sure I can. Does that mean, that n number of trivial string arrays can be indexed and queried with near linear increase of query time? Or does the query time increase more than near linear with increasing number of trivial string arrays in the documents? For example, if I have 5 trivial string arrays in a document which I want to index and query, can I expect reasonable query times?


(ddorian43) #4

Yep, term filters are the fastest. Index speed will depend on length of array, while search speed on number+length of terms that you're searching.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.