Adding a label to the dataset - Multiple entries (large scale)

kaganova · January 25, 2024, 4:16pm

Hey,
I'm querying a large API dataset with ~200k different tokens. I'm trying to query a subset of the activity, by token - of about ~170k values.
I've tried using a filter for multiple values ("not in 30k") but I get various failure responces. Any suggestions as to how to proceed?

jessgarson · January 26, 2024, 8:24pm

Thanks for posting for the first time. Welcome, @kaganova!

I have a few follow-up questions here:

What version are you using?
Do you have any further context on what you are looking to accomplish versus the results you are getting?
Do you have a code sample you can share?
What errors are you getting?

Best,
Jess

kaganova · January 28, 2024, 6:51am

Hey!
Thanks for reverting and happy to become a community member!
I'll try to better explain where I stand.
I have a fairly active API log, with over 34m queries daily. But, queries are based on an API token authentication only, while the toke metadata is stored in a different table. We have over 200k tokens, 70k active daily.

What I'm trying to do is query the log based on token groups (let's assume - all the males, all the females). In order to do this I've pulled the list of all "male" token users, and I've added to a "is one of" filter.

I've limited the timeframe significantly (1 hour), but still I'm getting an error loading the data. When I've tried other methods of updating the filter I got a 413 error.

I'm using v 7.17.3.

Happy for any advice

jessgarson · January 29, 2024, 4:55pm

@kaganova Thanks for the follow-up and more information here. You are getting a 413 error, which usually means the request is too large.

Do you have the full text of the error you are getting? Are the tokens you are trying to search for only in the separate metadata index, or have they been enriched onto the same index you are trying to aggregate to get a bar chart?

It's important to note that you can't join indexes in Elasticsearch like you can join tables in a traditional database. Have you enriched your index with the metadata you need to ensure that the filter will reduce its volume?

Best,
Jessica

kaganova · January 29, 2024, 8:38pm

Hi Jessica

Thanks for reverting.
Yes I’m clear that I cannot join tables.
I'm also aware that I am using a very cumbersome method, hence the error - I actually don't get anything concrete : as you can see in the screenshot, system just goes idle after a while...

How can I enrich my index with the metadata to reduce the filter size?

Christian_Dahlqvist · January 30, 2024, 7:56am

Is each document you are seraching associated with a single token or can there be multiple token associated with an document?

How large are the token metadata records in terms of fields? How large is the token metadata index (primary index size - assuming you have it in Elasticsearch)?

Is token metadata ever updated? If so, how frequently? How frequently is token metadata added/removed?

kaganova · January 30, 2024, 8:14am

Hey!

I only have a single token per query.

I don’t have any token metadata stored in elastic.
It’s quite slim in terms of metadata - it has a 1/0 type attribute which is mandatory and potentially would be happy to add a numeric identifier as well.

Christian_Dahlqvist · January 30, 2024, 8:25am

If you had this data in Elasticsearch you could create an ingest pipeline with an enrich processor which would add the appropriate token metadata to each record. This way you could retrieve records by directly filtering on token metadata and not have to pass in the list of tokens.

You could also enrich the documents with this data outside of Elasticsearch before you index the data, but that depends on how your ingest data flow look like.

system · February 27, 2024, 8:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Advice on large dataset query problem Elasticsearch	1	360	March 26, 2018
Feedback needed on my query Elasticsearch	3	554	August 18, 2017
Query aggregation help Elasticsearch	2	342	August 13, 2019
Filtered text query Elasticsearch	3	294	July 6, 2017
Writing aggregate with filtering Elasticsearch	5	4938	October 30, 2019

Adding a label to the dataset - Multiple entries (large scale)

Related topics