提高第一次查询的性能[英] Improve performance of first query

本文是小编为大家收集整理的关于提高第一次查询的性能的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

如果执行以下数据库(Postgres)查询,则第二个调用速度更快.

我想第一个查询很慢,因为操作系统(Linux)需要从磁盘获取数据.第二个查询受益于文件系统级别和Postgres中的缓存.

有没有一种方法来优化数据库以在 first 呼叫上快速获取结果?

第一次调用(慢)

foo3_bar_p@BAR-FOO3-Test:~$ psql

foo3_bar_p=# explain analyze SELECT "foo3_beleg"."id", ... FROM "foo3_beleg" WHERE 
foo3_bar_p-# (("foo3_beleg"."id" IN (SELECT beleg_id FROM foo3_text where 
foo3_bar_p(# content @@ 'footown'::tsquery)) AND "foo3_beleg"."belegart_id" IN 
foo3_bar_p(# ('...', ...));
                                                                                             QUERY PLAN                                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=75314.58..121963.20 rows=152 width=135) (actual time=27253.451..88462.165 rows=11 loops=1)
   ->  HashAggregate  (cost=75314.58..75366.87 rows=5229 width=4) (actual time=16087.345..16113.988 rows=17671 loops=1)
         ->  Bitmap Heap Scan on foo3_text  (cost=273.72..75254.67 rows=23964 width=4) (actual time=327.653..16026.787 rows=27405 loops=1)
               Recheck Cond: (content @@ '''footown'''::tsquery)
               ->  Bitmap Index Scan on foo3_text_content_idx  (cost=0.00..267.73 rows=23964 width=0) (actual time=281.909..281.909 rows=27405 loops=1)
                     Index Cond: (content @@ '''footown'''::tsquery)
   ->  Index Scan using foo3_beleg_pkey on foo3_beleg  (cost=0.00..8.90 rows=1 width=135) (actual time=4.092..4.092 rows=0 loops=17671)
         Index Cond: (id = foo3_text.beleg_id)
         Filter: ((belegart_id)::text = ANY ('{...
         Rows Removed by Filter: 1
 Total runtime: 88462.809 ms
(11 rows)

第二个通话(fast)

  Nested Loop  (cost=75314.58..121963.20 rows=152 width=135) (actual time=127.569..348.705 rows=11 loops=1)
   ->  HashAggregate  (cost=75314.58..75366.87 rows=5229 width=4) (actual time=114.390..133.131 rows=17671 loops=1)
         ->  Bitmap Heap Scan on foo3_text  (cost=273.72..75254.67 rows=23964 width=4) (actual time=11.961..97.943 rows=27405 loops=1)
               Recheck Cond: (content @@ '''footown'''::tsquery)
               ->  Bitmap Index Scan on foo3_text_content_idx  (cost=0.00..267.73 rows=23964 width=0) (actual time=9.226..9.226 rows=27405 loops=1)
                     Index Cond: (content @@ '''footown'''::tsquery)
   ->  Index Scan using foo3_beleg_pkey on foo3_beleg  (cost=0.00..8.90 rows=1 width=135) (actual time=0.012..0.012 rows=0 loops=17671)
         Index Cond: (id = foo3_text.beleg_id)
         Filter: ((belegart_id)::text = ANY ('...
         Rows Removed by Filter: 1
 Total runtime: 348.833 ms
(11 rows)

foo3_text表(28m行)的表布局

foo3_egs_p=# \d foo3_text
                                 Table "public.foo3_text"
  Column  |         Type          |                         Modifiers                          
----------+-----------------------+------------------------------------------------------------
 id       | integer               | not null default nextval('foo3_text_id_seq'::regclass)
 beleg_id | integer               | not null
 index_id | character varying(32) | not null
 value    | text                  | not null
 content  | tsvector              | 
Indexes:
    "foo3_text_pkey" PRIMARY KEY, btree (id)
    "foo3_text_index_id_2685e3637668d5e7_uniq" UNIQUE CONSTRAINT, btree (index_id, beleg_id)
    "foo3_text_beleg_id" btree (beleg_id)
    "foo3_text_content_idx" gin (content)
    "foo3_text_index_id" btree (index_id)
    "foo3_text_index_id_like" btree (index_id varchar_pattern_ops)
Foreign-key constraints:
    "beleg_id_refs_id_6e6d40770e71292" FOREIGN KEY (beleg_id) REFERENCES foo3_beleg(id) DEFERRABLE INITIALLY DEFERRED
    "index_id_refs_name_341600137465c2f9" FOREIGN KEY (index_id) REFERENCES foo3_index(name) DEFERRABLE INITIALLY DEFERRED

硬件更改(SSD而不是传统磁盘)或RAM磁盘是可能的.但是也许当前的硬件也可以取得更快的结果.

版本:x86_64-Inninnown-linux-gnu

on Postgresql 9.1.2

如果您需要更多详细信息,请发表评论.

推荐答案

Postgres为您提供了在运行时查询执行以确定I/O操作优先级时进行一些配置的机会.

random_page_cost(floating point) ----- (参考)可能会对您有所帮助.它基本上将设置您的IO/CPU操作比率.

更高的值表示I/O很重要,我具有顺序磁盘;较低的值表示I/O并不重要,我有随机访问磁盘.

默认值是4.0,可能是您要增加并测试您的查询是否需要更短的时间.

不要忘记,您的I/O优先级将取决于您的列计数,行计数.

大,但是由于您的表明是btree,因此您的CPU优先级的下降速度要比I/O优先级要快得多.您基本上可以将复杂性映射到优先级.

CPU Priority = O(log(x))
I/O Priority = O(x)

总的来说,这意味着,如果PostGre的值4.0将用于100k条目,则应将其设置为(大约)(4.0 * log(100k) * 10M)/(log(10M) * 100k) 10M入口.

其他推荐答案

同意Julius,但是,如果您只需要Foo3_Beleg中的东西,请尝试使用(如果您也粘贴了SQL,这将有所帮助,而不仅仅是您的解释计划).

>

select ...
from foo3_beleg b
where exists
(select 1 from foo_text s where t.beleg_id = b.id)
....

但是,我怀疑您在第1次的"唤醒"只是您的数据库将子查询行加载到内存中.无论如何,这可能会发生,尽管存在的速度通常要比IN快得多(如果不包含硬编码列表,很少需要INS,如果我审查SQL,则是黄色标志).

其他推荐答案

您第一次执行查询时,Postgres将从磁盘上加载数据,即使使用良好的硬盘驱动器也很慢.您第二次运行查询时,它将从RAM加载显然更快的先前加载数据.

解决此问题的解决方案是将关系数据加载到操作系统缓冲区缓存或使用以下方式的PostgreSQL缓冲区缓存中

int8 pg_prewarm(regclass, mode text default 'buffer', fork text default 'main', first_block int8 default null, last_block int8 default null):

第一个论点是要预热的关系.第二个论点是要使用的前传加热方法,如下所述.第三是要预热的关系叉,通常是主要的.第四个参数是要预热的第一个块号(NULL被接受为零的同义词).第五个参数是前往前的最后一个块号(NULL表示通过关系中的最后一个块的prewarm).返回值是预热的块数量.

有三种可用的预热方法.如果支持此操作系统或否则会引发错误,请在操作系统上发出异步预取请求.读取所需的块范围;与预摘要不同,这是同步的,并且在所有平台和构建上都得到了支持,但可能会较慢.缓冲区将所需的块范围读取到数据库缓冲区缓存中.

请注意,使用这些方法中的任何一种,试图在使用预取或读取或使用缓冲区时通过OS来缓存的预热块比可以缓存更多的块 - 可能会导致较低的块被驱逐为较高的编号.阅读了块.预热数据也没有特殊的保护范围,因此可以在阅读后不久将其他系统活动驱逐新预热的块.相反,预热还可以驱逐缓存中的其他数据.由于这些原因,当缓存在很大程度上是空的时,预热通常是最有用的.

.

source

希望这有所帮助!

本文地址:https://www.itbaoku.cn/post/1763839.html

问题描述

If the following database (postgres) queries are executed, the second call is much faster.

I guess the first query is slow since the operating system (linux) needs to get the data from disk. The second query benefits from caching at filesystem level and in postgres.

Is there a way to optimize the database to get the results fast on the first call?

First call (slow)

foo3_bar_p@BAR-FOO3-Test:~$ psql

foo3_bar_p=# explain analyze SELECT "foo3_beleg"."id", ... FROM "foo3_beleg" WHERE 
foo3_bar_p-# (("foo3_beleg"."id" IN (SELECT beleg_id FROM foo3_text where 
foo3_bar_p(# content @@ 'footown'::tsquery)) AND "foo3_beleg"."belegart_id" IN 
foo3_bar_p(# ('...', ...));
                                                                                             QUERY PLAN                                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=75314.58..121963.20 rows=152 width=135) (actual time=27253.451..88462.165 rows=11 loops=1)
   ->  HashAggregate  (cost=75314.58..75366.87 rows=5229 width=4) (actual time=16087.345..16113.988 rows=17671 loops=1)
         ->  Bitmap Heap Scan on foo3_text  (cost=273.72..75254.67 rows=23964 width=4) (actual time=327.653..16026.787 rows=27405 loops=1)
               Recheck Cond: (content @@ '''footown'''::tsquery)
               ->  Bitmap Index Scan on foo3_text_content_idx  (cost=0.00..267.73 rows=23964 width=0) (actual time=281.909..281.909 rows=27405 loops=1)
                     Index Cond: (content @@ '''footown'''::tsquery)
   ->  Index Scan using foo3_beleg_pkey on foo3_beleg  (cost=0.00..8.90 rows=1 width=135) (actual time=4.092..4.092 rows=0 loops=17671)
         Index Cond: (id = foo3_text.beleg_id)
         Filter: ((belegart_id)::text = ANY ('{...
         Rows Removed by Filter: 1
 Total runtime: 88462.809 ms
(11 rows)

Second call (fast)

  Nested Loop  (cost=75314.58..121963.20 rows=152 width=135) (actual time=127.569..348.705 rows=11 loops=1)
   ->  HashAggregate  (cost=75314.58..75366.87 rows=5229 width=4) (actual time=114.390..133.131 rows=17671 loops=1)
         ->  Bitmap Heap Scan on foo3_text  (cost=273.72..75254.67 rows=23964 width=4) (actual time=11.961..97.943 rows=27405 loops=1)
               Recheck Cond: (content @@ '''footown'''::tsquery)
               ->  Bitmap Index Scan on foo3_text_content_idx  (cost=0.00..267.73 rows=23964 width=0) (actual time=9.226..9.226 rows=27405 loops=1)
                     Index Cond: (content @@ '''footown'''::tsquery)
   ->  Index Scan using foo3_beleg_pkey on foo3_beleg  (cost=0.00..8.90 rows=1 width=135) (actual time=0.012..0.012 rows=0 loops=17671)
         Index Cond: (id = foo3_text.beleg_id)
         Filter: ((belegart_id)::text = ANY ('...
         Rows Removed by Filter: 1
 Total runtime: 348.833 ms
(11 rows)

Table layout of the foo3_text table (28M rows)

foo3_egs_p=# \d foo3_text
                                 Table "public.foo3_text"
  Column  |         Type          |                         Modifiers                          
----------+-----------------------+------------------------------------------------------------
 id       | integer               | not null default nextval('foo3_text_id_seq'::regclass)
 beleg_id | integer               | not null
 index_id | character varying(32) | not null
 value    | text                  | not null
 content  | tsvector              | 
Indexes:
    "foo3_text_pkey" PRIMARY KEY, btree (id)
    "foo3_text_index_id_2685e3637668d5e7_uniq" UNIQUE CONSTRAINT, btree (index_id, beleg_id)
    "foo3_text_beleg_id" btree (beleg_id)
    "foo3_text_content_idx" gin (content)
    "foo3_text_index_id" btree (index_id)
    "foo3_text_index_id_like" btree (index_id varchar_pattern_ops)
Foreign-key constraints:
    "beleg_id_refs_id_6e6d40770e71292" FOREIGN KEY (beleg_id) REFERENCES foo3_beleg(id) DEFERRABLE INITIALLY DEFERRED
    "index_id_refs_name_341600137465c2f9" FOREIGN KEY (index_id) REFERENCES foo3_index(name) DEFERRABLE INITIALLY DEFERRED

Hardware changes (SSD instead of traditional disks) or RAM disks are possible. But maybe there the current hardware can do faster results, too.

Version: PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu

Please leave a comment if you need more details.

推荐答案

Postgres is providing you a chance to do some configuration on runtime query executing for deciding your I/O operation priority.

random_page_cost(floating point) -(reference) is what may help you. It will basically set your IO/CPU operation ratio.

Higher value means I/O is important, I have sequential disk; and lower value means I/O is not important, I have random-access disk.

Default value is 4.0, and may be you want to increase and test if your query take shorter time.

Do not forget, your I/O priority will depend on your column count, row count.

A big BUT; since your indicies are btree, your CPU priority is going down much faster than I/O priorities going up. You can basically map complexities to priorities.

CPU Priority = O(log(x))
I/O Priority = O(x)

All in all, this means, if Postgre's value 4.0 would for 100k entries, You should set it to (approx.) (4.0 * log(100k) * 10M)/(log(10M) * 100k) for 10M entry.

其他推荐答案

Agree with Julius but, if you only need stuff from foo3_beleg, try EXISTS in instead (and it would help if you'd pasted your sql too, not just your explain plan).

select ...
from foo3_beleg b
where exists
(select 1 from foo_text s where t.beleg_id = b.id)
....

However, I suspect your "wake up" on the 1st pass is just your db loading up the IN subquery rows into memory. That will likely happen regardless, though an EXISTS is generally much faster than an IN (INs are rarely needed, if not containing hardcoded lists, and a yellow flag if I review sql).

其他推荐答案

The first time you execute the query, postgres will load the data from the disk which is slow even with a good hard drive. The second time you run your query it will load the previously loaded data from the RAM which is obviously faster.

The solution to this problem would be to load relation data into either the operating system buffer cache or the PostgreSQL buffer cache with:

int8 pg_prewarm(regclass, mode text default 'buffer', fork text default 'main', first_block int8 default null, last_block int8 default null) :

The first argument is the relation to be prewarmed. The second argument is the prewarming method to be used, as further discussed below; the third is the relation fork to be prewarmed, usually main. The fourth argument is the first block number to prewarm (NULL is accepted as a synonym for zero). The fifth argument is the last block number to prewarm (NULL means prewarm through the last block in the relation). The return value is the number of blocks prewarmed.

There are three available prewarming methods. prefetch issues asynchronous prefetch requests to the operating system, if this is supported, or throws an error otherwise. read reads the requested range of blocks; unlike prefetch, this is synchronous and supported on all platforms and builds, but may be slower. buffer reads the requested range of blocks into the database buffer cache.

Note that with any of these methods, attempting to prewarm more blocks than can be cached — by the OS when using prefetch or read, or by PostgreSQL when using buffer — will likely result in lower-numbered blocks being evicted as higher numbered blocks are read in. Prewarmed data also enjoys no special protection from cache evictions, so it is possible for other system activity may evict the newly prewarmed blocks shortly after they are read; conversely, prewarming may also evict other data from cache. For these reasons, prewarming is typically most useful at startup, when caches are largely empty.

Source

Hope this helped !