SQL queries generated during cube（MS_SSAS）

↖(^ω^)↗

MAIN

processing

When Analysis Services needs to process a cube or a dimension, it sends queries to the relational database in order to retrieve the information it needs. Not all the queries are simple SELECTs; there are many situations in which Analysis Services generates complex queries. Even if we do not have space enough to cover all scenarios, we're going to provide some examples relating to SQL Server, and we advise the reader to have a look at the SQL queries generated for their own cube to check whether they can be optimized in some way.

Dimension processing

During dimension processing Analysis Services sends several queries, one for each attribute of the dimension, in the form of SELECT DISTINCT ColName, where ColName is the name of the column holding the attribute.

Many of these queries are run in parallel (exactly which ones can be run in parallel depends on the attribute relationships defined on the Analysis Services dimension), so SQL Server will take advantage of its cache system and perform only one physical read of the table, so all successive scans are performed from memory. Nevertheless, keep in mind that the task of detecting the DISTINCT values of the attributes is done by SQL Server, not Analysis Services.

We also need to be aware that if our dimensions are built from complex views, they might confuse the SQL Server engine and lead to poor SQL query performance. If, for example, we add a very complex WHERE condition to our view, then the condition will be evaluated more than once. We have personally seen a situation where the processing of a simple time dimension with only a few hundred rows, which had a very complex WHERE condition, took tens of minutes to complete.

扫描二维码关注公众号，回复： 4401388 查看本文章

Dimensions with joined tables

If a dimension contains attributes that come from a joined table, the JOIN is performed by SQL Server, not Analysis Services. This situation arises very frequently when we define snowflakes instead of simpler star schemas. Since some attributes of a dimension are computed by taking their values from another dimension table, Analysis Services will send a query to SQL Server containing the INNER JOIN between the two tables.

Beware that the type of JOIN requested by Analysis Services is always an INNER JOIN. If, for any reason, you need a LEFT OUTER JOIN, then you definitely need to avoid using joined tables inside the DSV and use, as we suggest, SQL VIEWS to obtain the desired result.

As long as all the joins are made on the primary keys, this will not lead to any problems but, in cases where the JOIN is not made on the primary key, bad performance might result. As we said before, if we succeed in the goal of exposing to Analysis Services a simple star schema, we will never have to handle these JOINs. As we argue below, if a snowflake is really needed we can still hide it from Analysis Services using views, and in these views we will have full control over, and knowledge of, the complexity of the query used.

Reference dimensions

Reference dimensions, when present in the cube definition, will lead to one of the most hidden and most dangerous types of JOIN. When we define the relationship between a dimension and a fact table, we can use the Referenced relationship type and use an intermediate dimension to relate the dimension to the fact table. Reference dimensions often appear in the design due to snowflakes or due to the need to reduce fact table size.

A referenced dimension may be materialized or not. If we decide to materialize a reference dimension (as BI Development Studio will suggest) the result is that the fact table query will contain a JOIN to the intermediate dimension, to allow Analysis Services to get the value of the key for the reference dimension.

If JOINs are a problem with dimension processing queries, they are a serious problem with fact table processing queries. It might be the case that SQL Server needs to write a large amount of data to its temporary database before returning information to Analysis Services. It all depends on the size of the intermediate table and the number of reference dimensions that appear in the cube design.

We are not going to say that referenced dimensions should not be used at all, as there are a few cases where reference dimensions are useful, and in the following chapters we will discuss them in detail. Nevertheless, we need to be aware that reference dimensions might create complex queries sent to SQL server and this can cause severe performance problems during cube processing.

Fact dimensions

The processing of dimensions related to measure group with a fact relationship type, usually created to hold degenerate dimensions, is performed in the same way as any other dimension. This means that a SELECT DISTINCT will be issued on all the degenerate dimension's attributes.

Clearly, as the dimension and the fact tables are the same, the query will ask for a DISTINCT over a fact table; given that fact tables can be very large, the query might take a long time to run. Nevertheless, if a degenerate dimension is needed and it is stored in a fact table, then there is no other choice but to pay the price with this query.

Distinct count measures

The last kind of query that we need to be aware of is when we have a measure group containing a DISTINCT COUNT measure. In this case, due to the way Analysis Services calculates distinct counts, the query to the fact table will be issued with an ORDER BY for the column we are performing the distinct count on.

Needless to say, this will lead to very poor performance because we are asking SQL Server to sort a fact table on a column that is not part of the clustered index (usually the clustered index is built on the primary key). The pressure on the temporary database will be tremendous and the query will take a lot of time.

There are some optimizations, mostly pertinent to partitioning, that need to be done when we have DISTINCT COUNT measures in very big fact tables. What we want to point out is that in this case a good knowledge of the internal behavior of Analysis Services is necessary in order to avoid bad performance when processing.

SQL queries generated during cube（MS_SSAS）

猜你喜欢