Search
  • Ishay Wayner

Monitoring Databases

Let’s imagine a futuristic refrigerator, fridge for short. This fridge has the most amazing feature, it can recognize the groceries inside of it. The even more amazing part is that after about a month or two of regularly using it, depending on the instructions of the imaginary manufacturer, the fridge has learned your shopping and consumption habits.


It can then use that knowledge to provide you with alerts such as, “you have used more milk than usual today, in that pace you would run out of milk by tomorrow” or “you have not had anything to eat today, is there a problem?”


Ideally, monitoring tools should behave just like this fridge, they should learn the metrics of our our database, also known as KPI (key performance indicators), and should alert us when these KPI are exceeding from either the high or the low threshold which the monitoring tool has learned is normal for our database, the range of values between the high and low thresholds of normal behavior for a KPI is called the KPI baseline.


The baseline is determined by reviewing historical data of a period in which we define the performance of the database as normal.


A good example to look at would be CPU utilization, let's say that for the past 3 months we have noticed that the CPU utilization of our database is between 20 and 70 percent, from that we can infer that if for some reason the CPU utilization exceeds 70 percent, there might be some problem with our application, maybe a new application feature was released which contains a query that needs to be optimized, knowing that as soon as possible can be crucial to preventing future problems related to CPU bottlenecks.


The low threshold can also be a sign of a problem with the application. If the CPU utilization of the database drops, that means that the application isn't sending queries, which might mean that portions of the application are down, which is usually not ideal.


It is important to note that what we may not consider normal behavior of the database now might be normal in six months. It is therefore our responsibility to create monitoring cycles in which we periodically review our KPI and update our baseline definitions.


While the best monitoring tools out there will do that for us, in some we would have to manually start the process of determining the baseline and in most we would have to review the data ourselves to determine the baseline and then manually update the KPI’s high and low thresholds.

Unfortunately, most monitoring tools don’t match this ideal definition, this however does not mean that these tools are bad tools, only that they have room to improve.


There are in fact a few monitoring tools which do not provide baseline functionality; however, I would still consider them very good tools, I will go over some of them in a later post.