如何考虑存储及性能的改进和优化
有关系模式:User(userId, userName), Article(articleId, userId, title, content),Vote(articleId, score),User为用户关系,Article为用户发表的文章关系,Vote为文章得票关系,title为文章标题、score为得票数。
(1)用SQL语言查询所有没发表过文章的用户名;
(2)用SQL语言查询得票数大于100的所有文章标题,按得票数倒序排列;
(3)用SQL语言查询出发表文章数大于5,文章平均得票数大于100的用户名,按平均得票数倒序排列;
(4)设计这些表的主键、外键和索引,并指出上面三个查询所使用的索引。
(5)当用户数超过1000万,文章数超过1亿时,如何考虑存储及性能的改进和优化?
------解决方案--------------------1、
select
*
from
[user] a
where
not exists(select 1 from Article where UserID=a.userId)
或
select
a.*
from
[User] a
left join
Article b on a.userId=b.UserID where b.ID is null
2、
select
a.[title],b.score
from
Article a
join
Vote b on a.ArticleID =b.ArticleID and b.score>100
order by b.Score desc
或
select
*
from
Article a ,Vote b
where a.ArticleID=b.ArticleID and b.Score>100 order by b.Score desc
3、
select
*
from
[User] a
where
(select count(1) from Article where UserID=a.userId)>5
and
exists(select 1 from Article b join Vote c on b.ArticleID=c.ArticleID and b.UserID=a.userId and avg(c.Score)>100)
order by
(select avg(c.Score)>100 from Article b join Vote c on b.ArticleID=c.ArticleID and b.UserID=a.userId ) desc
或
----Vote 中的Score 没有时,数量为0的情况下
select
a.UserId,a.userName,avg(c.Score)
as [平均数],count(1) as[文章数]
from
[User] a
left join
Article b on a.userId=b.UserID
left join
Vote c on b.ArticleID=c.ArticleID
where
avg(c.Score) >100
group by
a.UserId,a.userName
having count(1)>5
User:UserID 唯一聚集
Article:UserID,ArticleID 创组合索引
Vote:ArticleID 唯一聚集
------解决方案--------------------关注!
------解决方案--------------------order by
(select avg(c.Score) from Article b join Vote c on b.ArticleID=c.ArticleID and b.UserID=a.userId ) desc --去掉>100
------解决方案--------------------因为所查询的内容
大多数都是和userid有强关系
或者说
都是根据userid 来把 Article表 进行group 分组
ArticleID 本身应该是自增列吧
------解决方案--------------------mark
------解决方案--------------------Article表上为什么要创组合索引,只创ArticleID 索引不行吗?
因为两列都存在where中,有覆盖索引理论上会快,
分别建索引,会扫描两次