[Note from Pinal]: This is a new episode of Notes from the Field series. When we build an application, we build it with zeal and enthusiasm. We believe it will do well for many years but it is not the case always. After a while performance started to go down and everything is not in the best shape. This is the time when we need to monitor performance and take action based on our analysis.
In this episode of the Notes from the Field series I asked SQL Expert Bill Anton a very crucial question – How to monitor the performance of analysis services? Bill was very kind to answer the questions and provides plenty of information about how a novice developer can work with SSAS.
When it comes to ad-hoc query performance in business intelligence solutions, very few technologies rival a well-designed Analysis Services Multidimensional cube or Tabular model. And when that cube/tabular model is performing well, as it usually does in the beginning, life is good!
Over time, however, things can change. The underlying business evolves (hopefully) and information workers start asking different questions, resulting in new query patterns. The business grows and now more users are asking more questions of larger data volumes. Performance starts to deteriorate. Queries are taking longer to complete and nightly processing starts to run past the end of the maintenance window into business hours. Users are complaining, management is unhappy, and life is no longer good.
How did we not see this coming?
In my experience, 80% of the time the reason no one sees this type of issue coming is that there wasn’t a performance monitoring solution in place. The other 20% who have a performance monitoring solution simply aren’t using it or reviewing information being collected.
I don’t know why so many Analysis Services environments are neglected (I’m not a Business or IT therapist) but I’m going to tell you what steps need to be taken if you want to avoid it.
The Secret is Simple
The first step to maintaining a well-running Analysis Services instance is simple. Take measurements!
Taking regular measurements is the only way to know how things are performing over time. This seems ridiculously obvious, but so few companies actually do it. My hunch is that these folks just don’t know what needs to be measured.
Analysis Services can be broken down into 2 fundamental types of workloads:
Processing is the term used to describe how data gets loaded into the Analysis Services database. This workload is usually done at night, outside of business hours, so as not to coincide with user activity.
From a performance perspective, the primary things to keep an eye on are:
This is the amount of time it takes to load data into the Analysis Services database. As long as the processing duration fits within the scheduled maintenance window, there’s not much to worry about. However, if this duration is increasing over time and you think it will eventually run past that window, then you’ll need to review the rest of the information to figure out “why” and “what to do”.
How long does it take to process the Analysis Services database?
Is the processing duration increasing over time?
Keeping an eye on resource consumption (e.g. CPU, memory, disk, network) during processing is also a good idea. This kind of information can help shed some light on bottlenecks and provide guidance when coming up with a solution to processing related problems.
Is the processing workload requiring more and more memory? Are we maxing out CPU? How’s disk space?
There are many solutions to problems faced during processing, but without some insight into what’s happening on the system, it’s hard to know which solution is the optimal one.
For example, say we have a Multidimensional cube and notice that the processing duration for one of the measure groups is steadily increasing over time. We review the resource consumption and see that there’s plenty of CPU/Memory/IO to spare. In this case, we may consider partitioning this particular fact table and only processing the most recent partition or processing the partitions in parallel.
Pro Tip: Breaking up processing into stages will provide context to the information above and make it much easier to see which part(s) of the Analysis Services database is contributing to the increase in processing duration or memory consumption.
This refers to any activity that generates queries against the SSAS database. It could be users running reports, loading dashboards, analyzing data via pivot tables, etc. Because users typically don’t run the same queries at the same time every day, this workload can be much more complicated to monitor.
The key to success is to start with the high-level stuff and only go into the details if/when necessary.
The single most important thing to track for this type of workload is a log of the actual queries being executed against the Analysis Services database.
Not only will you be able to see which queries are slow, but you’ll also have the actual MDX/DAX executed. You won’t have to wait for the user to complain (telling you their report is taking too long to load) because you’ll already have the query and can start reviewing it immediately.
Some Analysis Services implementations actually have service level agreements (SLA) with criteria such as “no query should take more than 30 seconds to complete” and “the average query response time should be less than 5 seconds”. If you’re tracking every query against the Analysis Services database, not only will you know if the SLA has been violated, but you’ll know which query or queries it was that led up to the violation and can start troubleshooting immediately.
Tracking the number of folks using your system (and when they are using it) will prove very helpful for knowing if/when to start considering options for scaling the system up and/or out.
This information can usually be extracted from whatever mechanism you use to track queries being executed against the Analysis Services database, but it is important enough to deserve its own section.
In the same vein as the above discussion around tracking resource consumption of the processing workload, you’ll also want to track the same measures (e.g. CPU, memory, disk, network) throughout the day. This information may provide some clues as to why a query is slow.
For example, say you’re reviewing the top 10 slowest queries from the previous week and find several of the queries are now running very fast. At this point, you can switch over and start looking at the state of the system last week at the time the query was slow (from a resource consumption perspective) and perhaps find some clues, such as memory pressure or CPU bottleneck caused by a spike in activity.
Here are some examples of the types of questions you should be able to answer with the above information:
What are the top 10 slowest queries each week?
Who are your top users and what is the average number of queries they execute per day/week/month?
What are the average number of users per day/week/month?
What is the max number/average number of concurrent users per day/week/month?
Pro Tip: some folks incorrectly assume the OLAPQueryLog tracks queries. However, this table only tracks parts of queries (storage engine requests). A single query executed against an Analysis Services database could potentially generate 10s of records in this table. The implication is that it doesn’t give you the whole story and you won’t always be able to determine which queries are slow.
Now that you know what types of information you should be collecting and reviewing regularly, the next step is to figure out how you’re going to collect that information. The good news is that there are quite a few options available. The bad news is that you’ll have to wait until next time to find out. Here is the SQLPASS session which discusses the same concept Analysis Services: Show Me Where It Hurts. To continue learning about this subject, you can click over here to read the second part.
If you want to get started on performance monitoring with SSAS with the help of experts, read more over at Fix Your SQL Server.
Reference: Pinal Dave (https://blog.sqlauthority.com)