PostgreSQL巧用ctid等价改写SQL

312 阅读 0 评论 206 点赞

我是靠谱客的博主阔达信封，这篇文章主要介绍PostgreSQL巧用ctid等价改写SQL，现在分享给大家，希望可以做个参考。

PostgreSQL中的ctid即行号，有点类似oracle中的rowid，今天碰到一个SQL刚好利用到pg中ctid的特点得到了极大的优化。

原始SQL(语句里内容已修改)：

bill@bill=>do language plpgsql  $$
bill$# declare 
bill$# v_cnt int;
bill$# begin
bill$# select count(*) into v_cnt from t1 ;
bill$# if v_cnt>0 
bill$# then  
bill$# create table t2 as select * from t1 limit 100;
bill$# else 
bill$# create table t3 as select * from t1 limit 10;
bill$# end if;
bill$# end;
bill$# $$;
DO
Time: 62.643 ms

我们先来分析下上面的SQL，其意思大致为：获取t1表中的记录数，判断是否大于0，如果大于0则创建表t2，否则创建表t3。

因此上面的代码就写成这样了，但是我们思考下其真正的需求就是如此吗？
我们可以换个说法：如果t1表有记录则创建表t2，没有则创建表t3。

两者有区别吗？其实区别还是很大的，前者可是强调获取记录数，我们是不是一定要遍历整个表得出一个记录数才知道是否大于0？
真正需求的理解可以让我们这样实现，只要从t1表中成功获取到第一条记录，就可以停止检索了，表示该表有记录了，难道事实不是这样？

因此我们可以将原先的sql从select count() from t1;改造为：
select count() from t1 where ctid=’(0,1)’;

那么上面的SQL整个就可以写成：

bill@bill=>do language plpgsql  $$
bill$# declare 
bill$# v_cnt int;
bill$# begin
bill$# select count(*) into v_cnt from t1 where ctid='(0,1)' ;
bill$# if v_cnt>0 
bill$# then  
bill$# create table t2 as select * from t1 limit 100;
bill$# else 
bill$# create table t3 as select * from t1 limit 10;
bill$# end if;
bill$# end;
bill$# $$;
DO
Time: 2.930 ms

这里时间从62ms缩小到了2.9ms，提升了30倍！
为什么这样呢？我们使用ctid=’(0,1)'查询，这样只需要扫描到第一条记录就停止了，这一点我们从执行计划中也可以得到验证：

bill@bill=>explain analyze select count(*) from t1;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=14542.55..14542.56 rows=1 width=8) (actual time=86.535..86.535 rows=1 loops=1)
   ->  Gather  (cost=14542.33..14542.54 rows=2 width=8) (actual time=86.396..87.819 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=13542.33..13542.34 rows=1 width=8) (actual time=82.845..82.846 rows=1 loops=3)
               ->  Parallel Seq Scan on t1  (cost=0.00..12500.67 rows=416667 width=0) (actual time=0.009..47.854 rows=333333 loops=3)
 Planning Time: 0.087 ms
 Execution Time: 87.880 ms
(8 rows)


bill@bill=>explain analyze select count(*) from t1 where ctid='(0,1)';
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Aggregate  (cost=1.31..1.32 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=1)
   ->  Tid Scan on t1  (cost=0.00..1.31 rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=1)
         TID Cond: (ctid = '(0,1)'::tid)
 Planning Time: 0.080 ms
 Execution Time: 0.039 ms
(5 rows)