英文:
Can I write Clickhouse SQL query that stops counting after a certain limit is reached?
问题
我有一个包含大量行的表格。如果行数大于100,000,我想执行不同的后续查询。当然,我可以执行一个简单的COUNT来确定行数。但是,这将一直计数直到结束(可能有数百万行),而我只想知道计数是否超过100,000。以下是简单的查询示例:
SELECT COUNT(*)
FROM my_table
WHERE some_id = '....';
我可以让Clickhouse在达到一定阈值后停止计数吗?
英文:
I have a table with a lot of rows. If the number of rows is greater than 100,000, I want to a different follow-up query. I can, of course, do a simple COUNT to determine the number of rows. However, that will keep on counting until the end (maybe millions of rows) and I only want to know whether the count is more than 100,000. Here is the naive query:
SELECT COUNT(*)
FROM my_table
WHERE some_id = '....'
Can I make Clickhouse stop counting after reaching a certain threshold?
I already asked ChatGPT but it came back with illegal SQL. When I told it so, it came back with another query that did not work.
答案1
得分: 1
以下是翻译好的部分:
设置max_rows_to_read
可以停止读取MergeTree表。
261,636 <> 20,000,因为ClickHouse每次读取65,000行并使用多线程。
25000 <> 20000,因为ClickHouse每次读取25,000行并使用多线程。
英文:
https://clickhouse.com/docs/en/operations/settings/query-complexity#max-rows-to-read
The setting max_rows_to_read
can stop the reading of MergeTree table.
create table T ( A Int64, B Int64)
Engine=MergeTree order by (A,B)
as select 1, number from numbers(1e8);
select count() from T where A=1;
┌───count()─┐
│ 100000000 │
└───────────┘
1 row in set. Elapsed: 0.128 sec. Processed 100.00 million rows, 800.00 MB (783.02 million rows/s., 6.26 GB/s.)
select count() from T where A=1
settings max_rows_to_read=20000, read_overflow_mode='break';
┌─count()─┐
│ 261636 │
└─────────┘
1 row in set. Elapsed: 0.006 sec. Processed 261.64 thousand rows, 2.09 MB (41.13 million rows/s., 329.01 MB/s.)
261636 <> 20000 because Clickhouse reads by 65k rows and using multiple threads
select count() from T where A=1
settings max_rows_to_read=20000, read_overflow_mode='break',
max_threads=1, max_block_size=1000;
┌─count()─┐
│ 25000 │
└─────────┘
1 row in set. Elapsed: 0.005 sec. Processed 25.00 thousand rows, 200.00 KB (5.54 million rows/s., 44.34 MB/s.)
答案2
得分: 1
使用LIMIT
子句可以在达到特定的阈值
后使Clickhouse
停止计数。它允许您指定查询应返回的最大行数。以下查询将从my_table
表返回前100行:
SELECT *
FROM my_table
LIMIT 100;
要在达到特定的阈值
后停止计数,请使用带有负数的LIMIT
子句。以下查询将从my_table
表返回前100行,或在达到100,000行后停止计数,以先到者为准:
SELECT *
FROM my_table
LIMIT -100000;
使用以下查询来确定my_table
表中的行数是否大于100,000:
SELECT COUNT(*)
FROM my_table
WHERE some_id = '....'
LIMIT -100000;
如果行数大于100,000,COUNT()
函数将返回大于100,000的值。使用此值来确定是否运行后续查询。
例如,以下代码将在行数大于100,000时运行后续查询:
if (COUNT(*) > 100000) {
-- 运行后续查询
}
英文:
Make Clickhouse
stop counting after reaching a certain threshold
using the LIMIT
clause. Lets you specify the maximum number of rows that should be returned by the query. Below query will return the first 100 rows from the my_table
table:
SELECT *
FROM my_table
LIMIT 100;
To stop counting after a certain threshold
, use the LIMIT
clause with a negative number. following query will return the first 100 rows from the my_table table, or stop counting after reaching 100,000 rows, whichever comes first:
SELECT *
FROM my_table
LIMIT -100000;
Use the following query to determine whether the number of rows in the my_table
table is greater than 100,000:
SELECT COUNT(*)
FROM my_table
WHERE some_id = '....'
LIMIT -100000;
If the number of rows is greater than 100,000, the COUNT()
function will return a value greater than 100,000. use this value to determine whether to run the follow-up query.
e.g., the following code will run the follow-up query if the number of rows is greater than 100,000:
if (COUNT(*) > 100000) {
-- run follow-up query
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论