Original Post Read More
Did you ever find yourself in the situation where getting alerted to something happening inside your CosmosDB database would be useful? I’m talking about using raw SQL queries. For example, you would like to get an Azure Monitor alert if a query like “SELECT * FROM orders WHERE orders. status = ‘failed’ ” returns anything.
Optimally you would want the original application/service responsible for this domain to raise such an event and perform an action on it. But what if you can’t do that or want an alert before the app code gets fixed?
With Azure Monitor Alerts we have a very powerful tool to handle any type of alert incidents and taking actions on them. However, querying a database like CosmosDB directly is still not supported. Check here the sources it can query.
However, there is a low-code solution for this problem: LogicApps. The general idea is to have a LogicApps workflow querying the database from time to time using plain SQL query, make the workflow fail (which will generate a log in Log Analytics) and then create an alert having a source the Log Analytics query and not directly Cosmos.
Important note: this solution does not only work with Cosmos but any other database supported by LogicApps connectors like SQL Server, Postgres, Mysql, Oracle, DB2, etc…
We recommend using Standard SKU and not Consumption because most databases are protected to be accessible from VNETs and Consumption plan doesn’t support it
Configure your LogicApp to export logs to a Log Analytics workspace
Building the LogicApp workflow
We need to create a new workflow inside our LogicApp Standard that will be triggered by a recurrent schedule. Here, I’m setting up a 1-minute recurrence. This is the entire workflow:
Don’t worry, we will check step-by-step how to build it or alternatively you can find its source code here
1) Create a new stateful workflow (recurrence trigger doesn’t work with stateless workflows):
2) Add a recurrence 1-minute trigger:
3) Now add a CosmosDB -> “Query items” action with the query that should return something that will trigger the alert later on:
4) Add a Condition action that will check if the previous query return more at least 1 row:
5) In the “True” branch of the condition, add a Terminate action to end your workflow with a failure in case the query returned at least 1 item:
6) In the “False” branch of the condition, you can leave it empty because it will finalize your workflow with Success state (indicating there is nothing “wrong” and an alert should not be created”
7) Now Save your workflow and you should see it executing every minute in the Runs history:
8 ) As last step, make sure to export the LogicApps logs to a Log Analytics workspace using Diagnostic Settings (the steps are described in the Pre-requisites section of this article)
Creating the Azure Monitor alert
What we have achieved so far is that the LogicApp will export all the run history for the workflow as logs that we can query in Log Analytics. To confirm that, just browse to the Log Analytics workspace you created for the LogicApp and run the following query:
| where WorkflowName == ‘query-alert’ and OperationName == ‘WorkflowRunCompleted’ and TimeGenerated >= now(-1d)
You should be able to see the same results that you can see in the LogicApps Portal run history:
That’s what we need to create the Azure Monitor Alert. It will keep querying this data and trigger a new alert in case the last run returns “Failed”.
Now, browse to Azure Monitor -> Alerts -> Create -> New Alert Rule and follow these steps:
1) For the Scope, assign your Log Analytics service:
2) For Condition, select Signal Type = Log and then “Custom Log Search” as the signal. It will ask you what is the Kusto query to run, and you can put this:
| where WorkflowName == ‘<YOUR_WORKFLOW_NAME>’ and OperationName == ‘WorkflowRunCompleted’ and TimeGenerated >= now(-1d)
| summarize arg_max(TimeGenerated, Status, TimeGenerated, Error)
| project Status, TimeGenerated, Error
| where Status == ‘Failed’
This query will fetch the latest run for the workflow and will return a row only if the last run is a Failure. In the rest of the Conditions tab, you can specify these settings:
3) Next tab is Actions. Here you can create an Action Group and use all the power that alerts allows you:
4) Finally, in Details tab, I recommend checking on the preview feature named “Automatically resolve alerts”:
This will automatically close the alert when you fix the problem in your database, without requiring to manually close the Alert in Portal.
Testing the alert
Everything created, let’s make the LogicApp fail and then trigger the alert. In my case, I have a collection in Cosmos that is name “orders” which I am checking if any order with Status = “Failed”. I manually changed one of the items to that Status and the following things happened in sequence:
1) First, the LogicApp started to complete with Failed runs:
2) Give it a couple of minutes (around 1~3 minutes in my case) until the logs are exported to the Log Analytics workspace. And the 1-minute frequency of the alert that we configured earlier. And then the new Alert started to change its state to “Triggered” and I got an email in my inbox (that’s the action I configured for the alert but there are many others🙁
3) Finally, I went to the database and “fixed” the order status back to “success”. Then, the LogicApp started to return Success in the workflow, the Alert resolved automatically and I got this email from Azure Monitor notifying me that the problem is now resolved:
Conclusion and considerations
I would like to comment on some points and alternatives that you might have after reading this article:
It is possible to use the CosmosDB change feed trigger instead of a Recurrence trigger in the workflow. However, this solution is less flexible because it only works for Cosmos, you can’t do a free query using SQL and it will be triggered on every document change or update. If the updates are intensive, this might trigger your workflow too many times unnecessarily
The total latency for this solution is around 5 minutes. That means the time it takes to something happens in the database and the alert triggers is around 5 minutes. Roughly the time is composed of: LogicApps recurrence timer (1 minute) + delay for logs to be exported to Log Analytics (~3 minutes) + Alert frequency of evaluation (1 minute)
There are alternatives like using Azure Functions to achieve the same thing but the focus here is to be as much as low-code as possible
You could also perform a notification and directly in the end of the workflow in LogicApps instead of using the Azure Monitor Alert if you want to save cost. For example, if sending an email to notify someone is enough, that might do the job. As long as there is a connector to do what you want in LogicApps. Be aware that you would need to manage state a little bit to avoid sending the same notification every minute over and over (something that Azure Monitor handles for you)