执行计划不准确,该如何解决-MSSQL教程-爱易网页

执行计划不准确,该如何解决

日期：2014-05-18　浏览次数：20792 次

执行计划不准确
http://social.msdn.microsoft.com/Forums/en/sqldatabaseengine/thread/35d3df7d-ff82-48a4-bfbc-ffb95e31017f
没人鸟我...来csdn发个..
多谢

I have a typical fact - dimensional data warehouse, and when I am querying data from it I see strange thing happens...

sample query(query 1):

SQL code

Select count(1) from fct_device  t
Where t.monitored_Id=2 and
      t.data_time_id in (
       [b] select data_time_id from dim_date t
        where  data_date >= 28 and
               not exists(select 1 from dim_date
                          where data_date > t.data_date and 
                          data_year = t.data_year and
                          data_month = t.data_month )[/b]
        )

Tables:

fct_device is partitioned by data_time id (int) and contains roughly 20,000,000 records.

dim_date is a relatively small dimension table, containing ~30 records.

If I only execute the sub-query(marked in bold) it returns 7 records(the sub-query is supposed to return all the end-of-month date)

Query 1 takes more than 1 min to complete. The execution plan for query 1:

|--Compute Scalar(DEFINE:([Expr1010]=CONVERT_IMPLICIT(int,[Expr1013],0)))
|--Stream Aggregate(DEFINE:([Expr1013]=Count(*)))
|--Nested Loops(Left Anti Semi Join, WHERE:(dim_date.[DATA_DATE]>fct_device.[DATA_DATE] AND ....)
|--Nested Loops(Inner Join, OUTER REFERENCES:([t].[DATA_TIME_ID]))
| |--Clustered Index Scan(OBJECT:(dim_date. where data_date>=28)
| |--Table Scan(OBJECT:(fct_device AS [t])....)
|--Clustered Index Scan(dim_date)

The dim_date(predicate data_date>=28) first does Nested Loops join with the fact table, and then join with dim_date again(to do the Not Exists process). And seems it doesn't use the partition at all(no parallelism)...

I initially believed query 1 will be optimized to execute the sub-query first and then do the other steps, and it will be equivalent to the query below:

(query 2)

SQL code

Select count(1) from fct_device  t
Where t.monitored_Id=2 and
      t.data_time_id in (
       20111031,20111130,20111231,20120131,20120229,20120331,20120430

       )

Plan for query 2:

|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[globalagg1006],0)))
|--Stream Aggregate(DEFINE:([globalagg1006]=SUM([partialagg1005])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1005]=Count(*)))
|--Table Scan(...)

Query 2 takes 15 sec to complete. You can see it takes the advantage of the partition table by running in parallelism.

After that I did another modification and it's the most weird part...

query 3:

SQL code

Select count(1) from fct_device  t
Where t.monitored_Id=2 and
      t.data_time_id in (
        select [b]Cast(data_time_id as char)[/b] from dim_date t
        where  data_date >= 28 and
               not exists(select 1 from dim_date
                          where data_date > t.data_date and 
                          data_year = t.data_year and
                          data_month = t.data_month )
        )

The only change I made(marked in bold) is t

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

执行计划不准确,该如何解决

相关资料更多>

推荐阅读更多>