英文:
How could database have worse benchmark results on faster disk?
问题
I'm benchmarking comparable (2vCPU, 2G RAM) servers (Ubuntu 18.04) from DigitalOcean (DO) and AWS EC2 (t3a.small).
The disk benchmark (fio) results are similar to those in https://dzone.com/articles/iops-benchmarking-disk-io-aws-vs-digitalocean.
In summary:
DO --
READ: bw=218MiB/s (229MB/s), 218MiB/s-218MiB/s (229MB/s-229MB/s), io=3070MiB (3219MB), run=14060-14060msec
WRITE: bw=72.0MiB/s (76.5MB/s), 72.0MiB/s-72.0MiB/s (76.5MB/s-76.5MB/s), io=1026MiB (1076MB), run=14060-14060msec
EC2 --
READ: bw=9015KiB/s (9232kB/s), 9015KiB/s-9015KiB/s (9232kB/s-9232kB/s), io=3070MiB (3219MB), run=348703-348703msec
WRITE: bw=3013KiB/s (3085kB/s), 3013KiB/s-3013KiB/s (3085kB/s-3085kB/s), io=1026MiB (1076MB), run=348703-348703msec
This shows that DO's disk is more than 10 times faster than EC2's EBS.
However, sysbench results following https://severalnines.com/database-blog/how-benchmark-postgresql-performance-using-sysbench indicate that DO is slower than EC2 (using Postgres 11 default configuration, read-write test on oltp_legacy/oltp.lua).
DO --
transactions: 14704 (243.87 per sec.)
Latency (ms):
min: 9.06
avg: 261.77
max: 2114.04
95th percentile: 383.33
EC2 --
transactions: 20298 (336.91 per sec.)
Latency (ms):
min: 5.85
avg: 189.47
max: 961.27
95th percentile: 215.44
What could be the explanation?
英文:
I'm benchmarking comparable (2vCPU, 2G RAM) server (Ubuntu 18.04) from DigitalOcean (DO) and AWS EC2 (t3a.small).
The disk benchmark (fio) goes inline with the results of https://dzone.com/articles/iops-benchmarking-disk-io-aws-vs-digitalocean
In summary:
DO --
READ: bw=218MiB/s (229MB/s), 218MiB/s-218MiB/s (229MB/s-229MB/s), io=3070MiB (3219MB), run=14060-14060msec
WRITE: bw=72.0MiB/s (76.5MB/s), 72.0MiB/s-72.0MiB/s (76.5MB/s-76.5MB/s), io=1026MiB (1076MB), run=14060-14060msec
EC2 --
READ: bw=9015KiB/s (9232kB/s), 9015KiB/s-9015KiB/s (9232kB/s-9232kB/s), io=3070MiB (3219MB), run=348703-348703msec
WRITE: bw=3013KiB/s (3085kB/s), 3013KiB/s-3013KiB/s (3085kB/s-3085kB/s), io=1026MiB (1076MB), run=348703-348703msec
which shows DO disk more than 10 times faster than the EBS of EC2
However, sysbench following https://severalnines.com/database-blog/how-benchmark-postgresql-performance-using-sysbench is showing DO slower than EC2 (using Postgres 11 default configuration, read-write test on oltp_legacy/oltp.lua )
DO --
transactions: 14704 (243.87 per sec.)
Latency (ms):
min: 9.06
avg: 261.77
max: 2114.04
95th percentile: 383.33
EC2 --
transactions: 20298 (336.91 per sec.)
Latency (ms):
min: 5.85
avg: 189.47
max: 961.27
95th percentile: 215.44
What could be the explanation?
答案1
得分: 3
Sequential read/write throughput matters for tasks like large sequential scans, such as data warehousing or loading a large backup.
Your benchmark focuses on OLTP, which involves many small, quick queries. For OLTP, sequential throughput is not relevant.
Regarding reads (SELECTs), having enough RAM to keep your working set in cache is crucial to avoid actual IO. If that's not possible, read random access time becomes important.
For writes (UPDATE, INSERT), fsync latency, the time to commit data to stable storage, is the critical factor since a COMMIT in the database only finishes when data is written.
It's likely that EC2 offers better random access and fsync performance, possibly using SSDs or battery-backed cache.
英文:
Sequential read/write throughput matters for large sequential scans, stuff like data warehousing, loading a large backup, etc.
Your benchmark is OLTP which does lots of small quick queries. For this sequential throughput is irrelevant.
For reads (SELECTs) the most important factor is having enough RAM to keep your working set in cache and not do any actual IO. Failing that, it is read random access time.
For writes (UPDATE,INSERT) then the fsync latency, which is the time required to commit data to stable storage, is the most important factor since the database will only finish a COMMIT when data has been written.
Most likely the EC2 has better random access and fsync performance. Maybe it uses SSDs or battery-backed cache.
答案2
得分: 0
Sequential bandwidth and latency / iops are independent parameters.
Some workloads (like DBs) depend on latency for lots of small IOs. Or throughput for lots of small IO operations, iops (IOs per second).
英文:
Sequential bandwidth and latency / iops are independent parameters.
Some workloads (like DBs) depend on latency for lots of small IOs. Or throughput for lots of small IO operations, iops (IOs per second).
答案3
得分: 0
除了IOPS与吞吐量,其他人提到的内容外,我还想指出它们两者都是相似的数字。240 tps与330 tps。您可以通过执行诸如清理、分析或让其保持不变等操作来增加或减少几乎相同的数值。
还可能有其他因素。CPU速度可能不同,可能存在一种短时突发性能与限制重度用户的性能之间的区别,可能存在huge_pages的存在或缺失,不同的缓存时序,内存速度或不同的NVMe驱动程序。关键是240与330相比,并不像您想象的那么大的差距。
更新:还要指出的是,OLTP读/写事务不一定会受到磁盘性能的限制。如果您关闭同步,那么实际上并不会。
我不确定sysbench传统的OLTP读写测试确切在做什么,但我怀疑它更像是涉及多条记录、使用索引的银行交易,而不是某种原始的最大插入速率或最大CRUD操作速率基准测试。
在针对pg13的写入密集型基准测试中,我可以获得1000 tps,但在批量加载期间,每秒可以插入大约50k条记录,每条记录大约为100字节,这只是来自单个进程的Python客户端。而且关闭同步后可以接近100k。
英文:
In addition to IOPS vs throughput which others mentioned. I also wanted to point out that they are both pretty similar numbers. 240 tps vs 330 tps. you could add or subtract almost that much by just doing things like vacuum, analyze, or let it sit there for a while.
there could be other factors too. CPU speed could be different, there could be one performance for short burst vs throttling a heavy user, there could be presence or absence of huge_pages, different cache timings, memory speeds, or different nvme drivers. the point is 240 is not as much less than 330 as you might think.
Update: something else to point out is that OLTP read/write transactions arent necessary bottlenecked by disk performance. if you have sync off, then it really isnt.
I dont know exactly what the sysbench legacy OLTP read write test is doing, but I suspect its more like a bank xaction touching multiple records, using indexes, ... its probably not some sort of raw max insertion rate, or MAX CRUD operation rate benchmark.
I get 1000 tps on my desktop in the write heavy benchmark against pg13, but i can insert something like 50k records per second, each being ~ 100 bytes records from just a single process python client during bulk loads. and nearly 100k w/ sync off.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论