- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试使用 Postgresql 11 选择包含约 5 亿行的表的所有行。
这在具有 32 个 CPU 内核和 256GB RAM 以及读取/写入速度高达 ~200MB/s 的 SSD 的 VM 上需要大约 15 分钟,这比我看到人们选择一百万时的预期要高得多~1s 中的行数(https://dba.stackexchange.com/questions/188407/effectively-handle-10-100-millions-row-table-of-unrelated-data),尽管它们不对行进行排序。
此表上的查询将主要包括对表 80% 到 100% 的 SELECT
操作,以及日期时间过滤器,其中行按日期时间排序。
这是表格的描述:
postgres=# \d+ ohlcv;
Table "public.ohlcv"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
datetime | timestamp without time zone | | not null | | plain | |
open | real | | not null | | plain | |
high | real | | not null | | plain | |
low | real | | not null | | plain | |
close | real | | not null | | plain | |
volume | integer | | not null | | plain | |
Indexes:
"brin_datetime" brin (datetime)
一次添加所有行,然后添加 brin 索引。
这是查询,它似乎使用了 8 个 CPU 而不是可用的 32 个:
postgres=# explain analyze
postgres-# select * from ohlcv order by datetime;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=20712603.78..96175039.28 rows=610230784 width=28) (actual time=175360.971..721544.003 rows=610230801 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Sort (cost=20711603.64..20902300.76 rows=76278848 width=28) (actual time=125461.665..170299.327 rows=67803422 loops=9)
Sort Key: datetime
Sort Method: external merge Disk: 2429104kB
Worker 0: Sort Method: external merge Disk: 2404680kB
Worker 1: Sort Method: external merge Disk: 2406280kB
Worker 2: Sort Method: external merge Disk: 2656672kB
Worker 3: Sort Method: external merge Disk: 2635904kB
Worker 4: Sort Method: external merge Disk: 2637600kB
Worker 5: Sort Method: external merge Disk: 2643400kB
Worker 6: Sort Method: external merge Disk: 2437272kB
Worker 7: Sort Method: external merge Disk: 2439272kB
-> Parallel Seq Scan on ohlcv (cost=0.00..5249780.48 rows=76278848 width=28) (actual time=0.049..42506.065 rows=67803422 loops=9)
Planning Time: 0.566 ms
Execution Time: 1059414.396 ms
(17 rows)
这是postgres的配置:
check_function_bodies | on | Check function bodies during CREATE FUNCTION.
checkpoint_completion_target | 0.5 | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
checkpoint_flush_after | 256kB | Number of pages after which previously performed writes are flushed to disk.
checkpoint_timeout | 5min | Sets the maximum time between automatic WAL checkpoints.
checkpoint_warning | 30s | Enables warnings if checkpoint segments are filled more frequently than this.
client_encoding | UTF8 | Sets the client's character set encoding.
client_min_messages | notice | Sets the message levels that are sent to the client.
cluster_name | | Sets the name of the cluster, which is included in the process title.
commit_delay | 0 | Sets the delay in microseconds between transaction commit and flushing WAL to disk.
commit_siblings | 5 | Sets the minimum concurrent open transactions before performing commit_delay.
config_file | /var/lib/postgresql/data/postgresql.conf | Sets the server's main configuration file.
constraint_exclusion | partition | Enables the planner to use constraints to optimize queries.
cpu_index_tuple_cost | 0.005 | Sets the planner's estimate of the cost of processing each index entry during an index scan.
cpu_operator_cost | 0.0025 | Sets the planner's estimate of the cost of processing each operator or function call.
cpu_tuple_cost | 0.01 | Sets the planner's estimate of the cost of processing each tuple (row).
cursor_tuple_fraction | 0.1 | Sets the planner's estimate of the fraction of a cursor's rows that will be retrieved.
data_checksums | off | Shows whether data checksums are turned on for this cluster.
data_directory | /var/lib/postgresql/data | Sets the server's data directory.
data_directory_mode | 0700 | Mode of the data directory.
data_sync_retry | off | Whether to continue running after a failure to sync data files.
DateStyle | ISO, MDY | Sets the display format for date and time values.
db_user_namespace | off | Enables per-database user names.
deadlock_timeout | 1s | Sets the time to wait on a lock before checking for deadlock.
debug_assertions | off | Shows whether the running server has assertion checks enabled.
debug_pretty_print | on | Indents parse and plan tree displays.
debug_print_parse | off | Logs each query's parse tree.
debug_print_plan | off | Logs each query's execution plan.
debug_print_rewritten | off | Logs each query's rewritten parse tree.
default_statistics_target | 100 | Sets the default statistics target.
default_tablespace | | Sets the default tablespace to create tables and indexes in.
default_text_search_config | pg_catalog.english | Sets default text search configuration.
default_transaction_deferrable | off | Sets the default deferrable status of new transactions.
default_transaction_isolation | read committed | Sets the transaction isolation level of each new transaction.
default_transaction_read_only | off | Sets the default read-only status of new transactions.
default_with_oids | off | Create new tables with OIDs by default.
dynamic_library_path | $libdir | Sets the path for dynamically loadable modules.
dynamic_shared_memory_type | posix | Selects the dynamic shared memory implementation used.
effective_cache_size | 4GB | Sets the planner's assumption about the total size of the data caches.
effective_io_concurrency | 1 | Number of simultaneous requests that can be handled efficiently by the disk subsystem.
enable_bitmapscan | on | Enables the planner's use of bitmap-scan plans.
enable_gathermerge | on | Enables the planner's use of gather merge plans.
enable_hashagg | on | Enables the planner's use of hashed aggregation plans.
enable_hashjoin | on | Enables the planner's use of hash join plans.
enable_indexonlyscan | on | Enables the planner's use of index-only-scan plans.
enable_indexscan | on | Enables the planner's use of index-scan plans.
enable_material | on | Enables the planner's use of materialization.
enable_mergejoin | on | Enables the planner's use of merge join plans.
enable_nestloop | on | Enables the planner's use of nested-loop join plans.
enable_parallel_append | on | Enables the planner's use of parallel append plans.
enable_parallel_hash | on | Enables the planner's use of parallel hash plans.
enable_partition_pruning | on | Enable plan-time and run-time partition pruning.
enable_partitionwise_aggregate | off | Enables partitionwise aggregation and grouping.
enable_partitionwise_join | off | Enables partitionwise join.
enable_seqscan | on | Enables the planner's use of sequential-scan plans.
enable_sort | on | Enables the planner's use of explicit sort steps.
enable_tidscan | on | Enables the planner's use of TID scan plans.
escape_string_warning | on | Warn about backslash escapes in ordinary string literals.
event_source | PostgreSQL | Sets the application name used to identify PostgreSQL messages in the event log.
exit_on_error | off | Terminate session on any error.
external_pid_file | | Writes the postmaster PID to the specified file.
extra_float_digits | 0 | Sets the number of digits displayed for floating-point values.
force_parallel_mode | off | Forces use of parallel query facilities.
from_collapse_limit | 8 | Sets the FROM-list size beyond which subqueries are not collapsed.
fsync | on | Forces synchronization of updates to disk.
full_page_writes | on | Writes full pages to WAL when first modified after a checkpoint.
geqo | on | Enables genetic query optimization.
geqo_effort | 5 | GEQO: effort is used to set the default for other GEQO parameters.
geqo_generations | 0 | GEQO: number of iterations of the algorithm.
geqo_pool_size | 0 | GEQO: number of individuals in the population.
geqo_seed | 0 | GEQO: seed for random path selection.
geqo_selection_bias | 2 | GEQO: selective pressure within the population.
geqo_threshold | 12 | Sets the threshold of FROM items beyond which GEQO is used.
gin_fuzzy_search_limit | 0 | Sets the maximum allowed result for exact search by GIN.
gin_pending_list_limit | 4MB | Sets the maximum size of the pending list for GIN index.
hba_file | /var/lib/postgresql/data/pg_hba.conf | Sets the server's "hba" configuration file.
hot_standby | on | Allows connections and queries during recovery.
hot_standby_feedback | off | Allows feedback from a hot standby to the primary that will avoid query conflicts.
huge_pages | try | Use of huge pages on Linux or Windows.
ident_file | /var/lib/postgresql/data/pg_ident.conf | Sets the server's "ident" configuration file.
idle_in_transaction_session_timeout | 0 | Sets the maximum allowed duration of any idling transaction.
ignore_checksum_failure | off | Continues processing after a checksum failure.
ignore_system_indexes | off | Disables reading from system indexes.
integer_datetimes | on | Datetimes are integer based.
IntervalStyle | postgres | Sets the display format for interval values.
jit | off | Allow JIT compilation.
jit_above_cost | 100000 | Perform JIT compilation if query is more expensive.
jit_debugging_support | off | Register JIT compiled function with debugger.
jit_dump_bitcode | off | Write out LLVM bitcode to facilitate JIT debugging.
jit_expressions | on | Allow JIT compilation of expressions.
jit_inline_above_cost | 500000 | Perform JIT inlining if query is more expensive.
jit_optimize_above_cost | 500000 | Optimize JITed functions if query is more expensive.
jit_profiling_support | off | Register JIT compiled function with perf profiler.
jit_provider | llvmjit | JIT provider to use.
jit_tuple_deforming | on | Allow JIT compilation of tuple deforming.
join_collapse_limit | 8 | Sets the FROM-list size beyond which JOIN constructs are not flattened.
krb_caseins_users | off | Sets whether Kerberos and GSSAPI user names should be treated as case-insensitive.
krb_server_keyfile | FILE:/etc/postgresql-common/krb5.keytab | Sets the location of the Kerberos server key file.
lc_collate | en_US.utf8 | Shows the collation order locale.
lc_ctype | en_US.utf8 | Shows the character classification and case conversion locale.
lc_messages | en_US.utf8 | Sets the language in which messages are displayed.
lc_monetary | en_US.utf8 | Sets the locale for formatting monetary amounts.
lc_numeric | en_US.utf8 | Sets the locale for formatting numbers.
lc_time | en_US.utf8 | Sets the locale for formatting date and time values.
listen_addresses | * | Sets the host name or IP address(es) to listen to.
lo_compat_privileges | off | Enables backward compatibility mode for privilege checks on large objects.
local_preload_libraries | | Lists unprivileged shared libraries to preload into each backend.
lock_timeout | 0 | Sets the maximum allowed duration of any wait for a lock.
log_autovacuum_min_duration | -1 | Sets the minimum execution time above which autovacuum actions will be logged.
log_checkpoints | off | Logs each checkpoint.
log_connections | off | Logs each successful connection.
log_destination | stderr | Sets the destination for server log output.
log_directory | log | Sets the destination directory for log files.
log_disconnections | off | Logs end of a session, including duration.
log_duration | off | Logs the duration of each completed SQL statement.
log_error_verbosity | default | Sets the verbosity of logged messages.
log_executor_stats | off | Writes executor performance statistics to the server log.
log_file_mode | 0600 | Sets the file permissions for log files.
log_filename | postgresql-%Y-%m-%d_%H%M%S.log | Sets the file name pattern for log files.
log_hostname | off | Logs the host name in the connection logs.
log_line_prefix | %m [%p] | Controls information prefixed to each log line.
log_lock_waits | off | Logs long lock waits.
log_min_duration_statement | -1 | Sets the minimum execution time above which statements will be logged.
log_min_error_statement | error | Causes all statements generating error at or above this level to be logged.
log_min_messages | warning | Sets the message levels that are logged.
log_parser_stats | off | Writes parser performance statistics to the server log.
log_planner_stats | off | Writes planner performance statistics to the server log.
log_replication_commands | off | Logs each replication command.
log_rotation_age | 1d | Automatic log file rotation will occur after N minutes.
log_rotation_size | 10MB | Automatic log file rotation will occur after N kilobytes.
log_statement | none | Sets the type of statements logged.
log_statement_stats | off | Writes cumulative performance statistics to the server log.
log_temp_files | -1 | Log the use of temporary files larger than this number of kilobytes.
log_timezone | UTC | Sets the time zone to use in log messages.
log_truncate_on_rotation | off | Truncate existing log files of same name during log rotation.
logging_collector | off | Start a subprocess to capture stderr output and/or csvlogs into log files.
maintenance_work_mem | 64MB | Sets the maximum memory to be used for maintenance operations.
max_connections | 100 | Sets the maximum number of concurrent connections.
max_files_per_process | 1000 | Sets the maximum number of simultaneously open files for each server process.
max_function_args | 100 | Shows the maximum number of function arguments.
max_identifier_length | 63 | Shows the maximum identifier length.
max_index_keys | 32 | Shows the maximum number of index keys.
max_locks_per_transaction | 64 | Sets the maximum number of locks per transaction.
max_logical_replication_workers | 4 | Maximum number of logical replication worker processes.
max_parallel_maintenance_workers | 2 | Sets the maximum number of parallel processes per maintenance operation.
max_parallel_workers | 32 | Sets the maximum number of parallel workers that can be active at one time.
max_parallel_workers_per_gather | 32 | Sets the maximum number of parallel processes per executor node.
max_pred_locks_per_page | 2 | Sets the maximum number of predicate-locked tuples per page.
max_pred_locks_per_relation | -2 | Sets the maximum number of predicate-locked pages and tuples per relation.
max_pred_locks_per_transaction | 64 | Sets the maximum number of predicate locks per transaction.
max_prepared_transactions | 0 | Sets the maximum number of simultaneously prepared transactions.
max_replication_slots | 10 | Sets the maximum number of simultaneously defined replication slots.
max_stack_depth | 2MB | Sets the maximum stack depth, in kilobytes.
max_standby_archive_delay | 30s | Sets the maximum delay before canceling queries when a hot standby server is processing archived WAL data.
max_standby_streaming_delay | 30s | Sets the maximum delay before canceling queries when a hot standby server is processing streamed WAL data.
max_sync_workers_per_subscription | 2 | Maximum number of table synchronization workers per subscription.
max_wal_senders | 10 | Sets the maximum number of simultaneously running WAL sender processes.
max_wal_size | 1GB | Sets the WAL size that triggers a checkpoint.
max_worker_processes | 32 | Maximum number of concurrent worker processes.
min_parallel_index_scan_size | 512kB | Sets the minimum amount of index data for a parallel scan.
min_parallel_table_scan_size | 8MB | Sets the minimum amount of table data for a parallel scan.
min_wal_size | 80MB | Sets the minimum size to shrink the WAL to.
old_snapshot_threshold | -1 | Time before a snapshot is too old to read pages changed after the snapshot was taken.
operator_precedence_warning | off | Emit a warning for constructs that changed meaning since PostgreSQL 9.4.
parallel_leader_participation | on | Controls whether Gather and Gather Merge also run subplans.
parallel_setup_cost | 1000 | Sets the planner's estimate of the cost of starting up worker processes for parallel query.
parallel_tuple_cost | 0.1 | Sets the planner's estimate of the cost of passing each tuple (row) from worker to master backend.
password_encryption | md5 | Encrypt passwords.
port | 5432 | Sets the TCP port the server listens on.
post_auth_delay | 0 | Waits N seconds on connection startup after authentication.
pre_auth_delay | 0 | Waits N seconds on connection startup before authentication.
quote_all_identifiers | off | When generating SQL fragments, quote all identifiers.
random_page_cost | 4 | Sets the planner's estimate of the cost of a nonsequentially fetched disk page.
restart_after_crash | on | Reinitialize server after backend crash.
row_security | on | Enable row security.
search_path | "$user", public | Sets the schema search order for names that are not schema-qualified.
segment_size | 1GB | Shows the number of pages per disk file.
seq_page_cost | 1
work_mem | 4MB | Sets the maximum memory to be used for query workspaces.
是否有可能将执行时间缩短到几分钟或更短,或者这是预期的执行时间?
最佳答案
只有很少的东西可以帮助这个查询:
实际扫描似乎不是问题(花了 42 秒),但如果表可以保存在 RAM 中,可能会更快。
您的主要问题是 PostgreSQL 已经并行化的排序。
您可以调整一些内容:
尽量增加work_mem
,这样排序会更快。
增加max_worker_processes
(这需要重新启动)、max_parallel_workers
和max_parallel_workers_per_gather
,以便可以使用更多内核进行查询.
PostgreSQL 有一个内部逻辑来计算它准备用于一个表的并行 worker 的最大数量:它会考虑尽可能多的并行 worker
log3(表大小/min_parallel_table_scan_size
)
您可以强制它使用比以下更多的进程:
ALTER TABLE ohlcv SET (parallel_workers = 20);
但是 max_parallel_workers
仍然是上限。
如果表上没有删除和更新,并且数据是按排序顺序插入的,您可以省略 ORDER BY
子句,前提是您设置 synchronize_seqscans =关闭
。
关于sql - Postgresql 顺序扫描在 5 亿行上性能下降,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56876223/
使用 C# (VS2008) 和 WIA - 扫描到 TIFF 格式; 当我在平板或文档进纸器上使用扫描仪扫描 1 页时,该方法执行没有任何问题。当我将多个表单加载到进纸器时,扫描第一页后执行停止(保
已关闭。此问题需要 debugging details 。目前不接受答案。 编辑问题以包含 desired behavior, a specific problem or error, and the
给定一个列表 :: [(Foo, Bar)] ,我想在 Bar 上执行 scanl1 s,但保留他们的 Foo “标签”。 IE。我想要一个类型为 :: [(a, b)] -> ([b] -> [c]
我有一个 HBase 表,我需要从多个范围获取结果。例如,我可能需要从不同范围获取数据,例如第 1-6 行、100-150..... 我知道对于每次扫描,我可以定义开始行和停止行。但是如果我有 6 个
我看到了这段代码。我是 C 语言的新手,所以请原谅。 while下面的循环将继续循环 if i = SIZE,则 == 是无关紧要的,因为它根本不会被执行。如果 i 小于 SIZE 那么 scanf(
这是一个关于编译过程的相当技术性的问题ABAP代码。 我知道有ABAP解析器和扫描器类实际上调用 C 内核函数来完成实际工作。然后就是代码补全事务的功能,该事务以 ABAP 列表或 XML 的形式返回
给定以下程序: int main(){ float x = non_det_float(); float y = NAN; if (isnan(y) && x == 1.0f){
我在工作中使用由供应商生成的二维码。实际上我需要通过网站手动记录所有这些项目。 QR 码包含所有这些数据,所以我想创建一个自动执行操作的应用。 例如,二维码表示“AAA|BBB|CCC|123”。我想
我有一个像这样的字符串:@"ololo width: 350px jijiji width:440px ... text=12... "我想将@"width: "之后的所有数字替换为280。所以在扫描
我在玩 scanf 时遇到了一个小问题……更具体地说,我想读取整个输入,然后忽略其余部分。让我告诉你我的意思: #include int main(void) { int number_of
我正在使用 matlab/octave 创建扫描/线性调频信号,我的结束信号似乎以错误的频率结束。我该如何修复它,以便信号以正确的频率结束。 PS:我不能在 Octave 音程中使用 chirp 命令
我正在寻找一个可以扫描 WiFi 网络并打印所有 SSID 的程序。我试过 scapy 但我失败了。我正在使用 pyCharm 编辑器。 我试过这段代码: from scapy.all import
概述 Linux 完全是用于大型服务器的最流行和最安全的操作系统之一。尽管它被广泛使用,但它仍然容易受到网络攻击。黑客以服务器为目标,窃取有价值的信息。所以迫切需要开发反黑客方法来应对安全漏洞和恶
如何获取我的 Git 存储库的某种统计信息? 我目前在 BitBucket 中托管 Git 存储库,想查找以下详细信息: 提交总数 使用过的编程语言 每种编程语言的总代码行数 您认为这可以实现吗?还是
我目前正在使用以下代码来扫描作为申请表的一部分上传的文件: $safe_path = escapeshellarg($dir . $file); $command = '/usr/bin/clamsc
我在存储库中有十几个项目。存储库结构如下所示: / ------- + project1 +------- trunk +------- tags +----
我正在使用 Dynamo DB 并想使用过滤器扫描一个表。例如,是否可以使用全局二级索引仅扫描表中的特定行? 最佳答案 这不可能!扫描始终针对基表中的所有行,当您扫描索引表作为响应时,您将仅获得该索引
我正在尝试从这里使用 SOLStumbler:Accessing & Using the MobileWiFi.framework扫描 wifi 网络。我知道苹果不支持这一点,但它是用于教育目的和实验
我知道 iPhone 蓝牙功能在 3.0 之前无法通过 SDK 访问,但是需要多长时间才能找到该区域的设备?它取决于该区域的设备数量吗?如果范围内有大约 5 个设备,扫描发现所有设备是否需要花费 30
我正在使用Elasticsearch 6.2,并且有一些查询可以分析大量文档。我正在对索引内的一个字段进行排序。 Elasticsearch检查10.000个文档(默认配置值),然后将它们分页返回。
我是一名优秀的程序员,十分优秀!