Regex: getting all the hashtags and mentions used in all my documents

Hello, I'm using the Kibana console to perform such queries (they are separated: one for the hashtags, one for the mentions). The collection of documents are blog entries with a textContent field, which may have user mentions like @theUserName @AnotherOne or hashtags like #helloWorld and #hello2. The queries look like the following one:

GET /xblog/_search
{
"source": ["id", "textContent"],
"query": {
"regexp": {
"textContent": {
"value": "@([^-A-Za-z0-9
])",
"flags": "ALL"
}
}
}
}

But the problem is it's returning also the documents that do not contain a @userMention. I think the @ in the regex is being treated as a special symbol, but reading the documentation I couldn't find how to escape it.

Inthe docs, the authors say that you can escape any symbol with double quotes, so I tested:

""@""
But I got nothing.

I also testes expressions I'm used to, like:
/\s([@#][\w_-]+)/g

But that produces multiple errors in Kibana. I tried replacing some parts according to the documentation, but it's still not working.

Can you point me in the right direction?
Thanks in advance,

Hello,

I think the Elasticsearch team can help you more with query issues than the Kibana team. You should post in that Discuss area.

1 Like

rather than doing this at query time, which will be expensive as it has to be done every time you run a query, I would recommend you extract hashtags and mentions into separate fields before you index the data. You can then query and filter on these fields, which will be much easier and more performant.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.