Adding a label to the dataset - Multiple entries (large scale)

Hey,
I'm querying a large API dataset with ~200k different tokens. I'm trying to query a subset of the activity, by token - of about ~170k values.
I've tried using a filter for multiple values ("not in 30k") but I get various failure responces. Any suggestions as to how to proceed?

Thanks for posting for the first time. Welcome, @kaganova!

I have a few follow-up questions here:

  • What version are you using?
  • Do you have any further context on what you are looking to accomplish versus the results you are getting?
  • Do you have a code sample you can share?
  • What errors are you getting?

Best,
Jess

Hey!
Thanks for reverting and happy to become a community member!
I'll try to better explain where I stand.
I have a fairly active API log, with over 34m queries daily. But, queries are based on an API token authentication only, while the toke metadata is stored in a different table. We have over 200k tokens, 70k active daily.

What I'm trying to do is query the log based on token groups (let's assume - all the males, all the females). In order to do this I've pulled the list of all "male" token users, and I've added to a "is one of" filter.

I've limited the timeframe significantly (1 hour), but still I'm getting an error loading the data. When I've tried other methods of updating the filter I got a 413 error.

I'm using v 7.17.3.

Happy for any advice :slight_smile:

1 Like

@kaganova Thanks for the follow-up and more information here. You are getting a 413 error, which usually means the request is too large.

Do you have the full text of the error you are getting? Are the tokens you are trying to search for only in the separate metadata index, or have they been enriched onto the same index you are trying to aggregate to get a bar chart?

It's important to note that you can't join indexes in Elasticsearch like you can join tables in a traditional database. Have you enriched your index with the metadata you need to ensure that the filter will reduce its volume?

Best,
Jessica

Hi Jessica

Thanks for reverting.
Yes I’m clear that I cannot join tables.
I'm also aware that I am using a very cumbersome method, hence the error - I actually don't get anything concrete : as you can see in the screenshot, system just goes idle after a while...

How can I enrich my index with the metadata to reduce the filter size?

Is each document you are seraching associated with a single token or can there be multiple token associated with an document?

How large are the token metadata records in terms of fields? How large is the token metadata index (primary index size - assuming you have it in Elasticsearch)?

Is token metadata ever updated? If so, how frequently? How frequently is token metadata added/removed?

1 Like

Hey!

I only have a single token per query.

I don’t have any token metadata stored in elastic.
It’s quite slim in terms of metadata - it has a 1/0 type attribute which is mandatory and potentially would be happy to add a numeric identifier as well.

If you had this data in Elasticsearch you could create an ingest pipeline with an enrich processor which would add the appropriate token metadata to each record. This way you could retrieve records by directly filtering on token metadata and not have to pass in the list of tokens.

You could also enrich the documents with this data outside of Elasticsearch before you index the data, but that depends on how your ingest data flow look like.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.