插入值在Oracle数据库中 – 不同的方法

huangapple go评论56阅读模式
英文:

Inserting values in Oracle - different methods

问题

I read about different methods how to insert values in the Oracle database, like insert, insert all, CTAS, insert into select, SQL Loader but I would like to understand what is the reason that one is faster than the other.

例如,如果我有10000个值。
使用最简单的方法,这也是最差的方法之一

 insert into table_1 values (1); insert into table_1 values (2);....insert into table_1 values (10000);

这很慢,但我的问题是,与INSERT INTO SELECT或CTAS相比,发生了什么情况?

当然,CTAS首先创建一个表,但与insert into select一样,它必须将值输入到一个"表"中,并将其保存到文件中。假设表是空的(在CTAS的情况下根本没有表),那么多个简单的插入语句为什么如此缓慢。

从物理上讲,它必须执行相同的操作,但可能每种方法都有不同的开销。

a. 简单的插入语句但多次执行
b. insert into select
c. insert all语句
d. SQL LOADER
e. CTAS

除了最后一种方法,假设表是

  1. 空的
  2. 稀疏的 - 在插入之前删除了一些行
英文:

I read about different methods how to insert values in oracle database, like insert, insert all, CTAS, insert into select, SQL Loader but I would like to understand what is the reason that one is faster than the other.

for example if i have 10000 values.
using the simplest method which is the worst one

 insert into table_1 values ( 1) ; insert into table_1 values (2);....insert into table_1 values (10000);

this is very slow but my question what is happening compared to INSERT INTO SELECT or CTAS

Of course CTAS create a table first but as insert into select it must enter the value in a "table" which is a block and save it into the file. Suppose the table is empty (no table at all in case of CTAS) , what is happening that multiple simple insert statements are so slow.

physically it must do the same thing but probably each method has different overheads

a. simple insert statements but multiple times
b. insert into select
c. insert all statement
d. SQL LOADER
e. CTAS

except the last method suppose the table is

  1. empty
  2. sparse - some rows has been deleted before the insert

答案1

得分: 1

发生了什么事情,导致多个简单的插入语句如此缓慢。

从简单的角度来看,每个语句都会:

  • 被 SQL 引擎解析;
  • 写入日志文件(即归档日志、撤销/重做日志); 和
  • 写入数据文件。

当您使用多个 INSERT 语句时,就会有多次解析、多次写入日志文件,以及(可能)多次写入数据文件。

然而,当您使用单个 INSERT ALL ...INSERT ... SELECT ...CREATE TABLE ... AS SELECT ... 时,只需要进行一次(可能更大的)解析,一次(更大的)写入日志文件和数据文件。这意味着您可以显著减少 I/O 开销,如果这些语句在网络上运行,还可以减少传输开销。

CREATE TABLE ... AS SELECT ... 是一个DDL语句,在语句之前和之后会隐式地COMMIT。因此,很难与其他语句进行直接比较,因为它并不完全相同。

正如@astentx提到的,SQL Loader 使用了直接路径插入,绕过了缓冲区缓存,直接写入数据文件;然而,在某些情况下,您也可以使用INSERT语句进行直接路径插入。

很可能无法确定哪种方法是“最佳”的,但您应该发现多个INSERT语句的开销要远远大于其他选项,并且是最慢的选项。

英文:

> what is happening that multiple simple insert statements are so slow.

Simplistically, each statement will:

  • Be parsed by the SQL engine;
  • Write to the log files (i.e. archive log, undo/redo log); and
  • Get written to data file.

When you use multiple INSERT statements then you have multiple parses, multiple writes to the log files and (maybe) multiple writes to the data file.

However, when you have a single INSERT ALL ... or INSERT ... SELECT ... or CREATE TABLE ... AS SELECT ... then there only needs to be a single (possibly larger) parse and a single (larger) write to the log file and data file. This means that you can reduce the I/O overheads significantly and if the statements are being run over a network then you will reduce transmission overheads as well.

CREATE TABLE ... AS SELECT ... is a DDL statement and will implicitly COMMIT both before and after the statement. Therefore, it is difficult to compare to the other statements as it is not doing exactly the same thing.

As mentioned by @astentx, SQL Loader uses a direct-path insert which bypasses the buffer cache and writes directly to the data files; however, under certain circumstances you can also use direct-path insert with INSERT statements.

It would probably be impossible to say which method is the "best" but you should find that multiple INSERT statements has far greater overheads that any of the other options and would be the slowest option.

答案2

得分: 1

  1. INSERT INTO tbl VALUES 仅适用于单行(除非使用批量绑定,例如使用 OCI SYS_DL_CURSORFORALL INSERT VALUES 或 SQL*Loader 直接路径),因此相对于所完成的工作量,您会有过多的解析、执行、缓冲并发机制、重做和撤消条目、索引维护和网络往返的开销。对于一个数据行,它很快,但对于一百万行,效率很低。

  2. INSERT INTO tbl SELECT 适用于行源,可以一次性处理多行。因此,您不需要为每一行处理的网络往返、解析和执行开销。

  3. 使用 INSERT /*+ APPEND */ INTO tbl SELECT 进一步改善了性能,它使用直接路径(绕过缓冲区缓存/DBWR,直接写入磁盘),并且在高水位标记(HWM)上方完全格式化新块,而不是在现有块中找到空闲空间。这也意味着它不需要为数据本身存储撤消信息,因为回滚可以简单地丢弃新分配的块,所以它不需要为每一行存储撤消信息。如果没有强制记录日志,并且在提示或表中添加 NOLOGGING,这种插入还会绕过重做。这样可以加快速度(但不可恢复)。

  4. INSERT INTO tbl SELECT /*+ PARALLEL(x) */ ... 通过在查询的 SELECT 部分使用多个并行执行从属,加快了速度,这有助于处理表连接或带有过滤条件的全表扫描...对于直接的 SELECT * FROM table 不会有帮助,因为并行从属会在串行插入负载步骤上受限。

  5. INSERT /*+ ENABLE_PARALLEL_DML APPEND PARALLEL(x) */ INTO tbl SELECT 通过将加载步骤本身并行化,实际块格式化和加载,进一步加快了速度。它还将索引维护推迟到加载之后,如果有索引,这会进一步提高速度。

  6. CREATE TABLE tbl AS SELECT(CTAS)与 PARALLEL (DEGREE x)(特别是带有 NOLOGGING)是最快的,因为它获得了上述所有优势,而且不需要撤消、锁定、零索引维护、用于检查空间位图的数据段头查找等...它正在处理自己全新的段,所以这些都是不必要的。我反复发现,相较于已存在的空表的并行追加插入,CTAS 的性能更好,但是如果按照上述方法仔细设置插入,您可以接近CTAS的性能。

英文:
  1. INSERT INTO tbl VALUES works on only a single row (unless using an bulk bind such as with OCI SYS_DL_CURSOR or FORALL INSERT VALUES or SQL*Loader direct path) and as such you have an excessive amount of overhead for parse, execution, buffer concurrency mechanisms, redo & undo entries, index maintenance, and network round trip relative to the amount of work accomplished. It's quick with one row, but miserable with a million rows.

  2. INSERT INTO tbl SELECT works on a row source, as many rows as you'd like all at once. Therefore you don't have the network round trip, parsing and execution overhead for every row.

  3. INSERT /*+ APPEND */ INTO tbl SELECT improves things further by using direct path (bypasses the buffer cache/DBWR and writes directly to disk) and also formats completely new blocks above the high-water mark (HWM) rather than finding empty space in existing blocks. This also means that it doesn't need to store undo for the data itself, as a rollback can simply discard the newly allocated blocks and so doesn't need undo information for each row. If logging is not being forced and you add NOLOGGING to the hints or the table, this kind of insert will also bypass redo. This really speeds things up (but makes it unrecoverable).

  4. INSERT INTO tbl SELECT /*+ PARALLEL(x) */ ... speeds things up by throwing multiple parallel execution slaves at the SELECT portion of the query, which helps only if you are doing table joins or full scan with a filter predicate... it won't help with a straight SELECT * FROM table because the parallel slaves will be bottlenecked on the serial insert load step.

  5. INSERT /*+ ENABLE_PARALLEL_DML APPEND PARALLEL(x) */ INTO tbl SELECT further speeds things up by parallelizing the load step itself, the actual block formatting and loading. It also postpones index maintenance until after the load which further speeds things up if you have indexes.

  6. CREATE TABLE tbl AS SELECT (CTAS) with PARALLEL (DEGREE x) (and especially with NOLOGGING) is the fastest of all because it gets the benefit of all of the above plus there's no need for undo, locking, zero index maintenance, data segment header lookups for checking space bitmaps, etc.. it's working on its own brand new segment so none of these are necessary. I have repeatedly found that CTAS outperforms parallel append insert of an existing, empty table, though if you set up your insert carefully as outlined above, you can get close to CTAS performance.

huangapple
  • 本文由 发表于 2023年5月22日 13:02:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76303157.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定