可以我编写Clickhouse SQL查询,在达到某个限制后停止计数吗?

huangapple go评论75阅读模式
英文:

Can I write Clickhouse SQL query that stops counting after a certain limit is reached?

问题

我有一个包含大量行的表格。如果行数大于100,000,我想执行不同的后续查询。当然,我可以执行一个简单的COUNT来确定行数。但是,这将一直计数直到结束(可能有数百万行),而我只想知道计数是否超过100,000。以下是简单的查询示例:

SELECT COUNT(*)
FROM my_table
WHERE some_id = '....';

我可以让Clickhouse在达到一定阈值后停止计数吗?

英文:

I have a table with a lot of rows. If the number of rows is greater than 100,000, I want to a different follow-up query. I can, of course, do a simple COUNT to determine the number of rows. However, that will keep on counting until the end (maybe millions of rows) and I only want to know whether the count is more than 100,000. Here is the naive query:

SELECT COUNT(*)
FROM my_table
WHERE some_id = '....'

Can I make Clickhouse stop counting after reaching a certain threshold?

I already asked ChatGPT but it came back with illegal SQL. When I told it so, it came back with another query that did not work.

答案1

得分: 1

以下是翻译好的部分:

设置max_rows_to_read可以停止读取MergeTree表。

261,636 <> 20,000,因为ClickHouse每次读取65,000行并使用多线程。

25000 <> 20000,因为ClickHouse每次读取25,000行并使用多线程。

英文:

https://clickhouse.com/docs/en/operations/settings/query-complexity#max-rows-to-read

The setting max_rows_to_read can stop the reading of MergeTree table.

create table T ( A Int64, B Int64) 
Engine=MergeTree order by (A,B) 
as select 1, number from numbers(1e8);

select count() from T where A=1;
┌───count()─┐
 100000000 
└───────────┘
1 row in set. Elapsed: 0.128 sec. Processed 100.00 million rows, 800.00 MB (783.02 million rows/s., 6.26 GB/s.)


select count() from T where A=1 
settings max_rows_to_read=20000, read_overflow_mode=&#39;break&#39;;
┌─count()─┐
  261636 
└─────────┘
1 row in set. Elapsed: 0.006 sec. Processed 261.64 thousand rows, 2.09 MB (41.13 million rows/s., 329.01 MB/s.)

261636 <> 20000 because Clickhouse reads by 65k rows and using multiple threads

select count() from T where A=1 
settings max_rows_to_read=20000, read_overflow_mode=&#39;break&#39;,
max_threads=1, max_block_size=1000;
┌─count()─┐
   25000 
└─────────┘
1 row in set. Elapsed: 0.005 sec. Processed 25.00 thousand rows, 200.00 KB (5.54 million rows/s., 44.34 MB/s.)

答案2

得分: 1

使用LIMIT子句可以在达到特定的阈值后使Clickhouse停止计数。它允许您指定查询应返回的最大行数。以下查询将从my_table表返回前100行:

SELECT *
FROM my_table
LIMIT 100;

要在达到特定的阈值后停止计数,请使用带有负数的LIMIT子句。以下查询将从my_table表返回前100行,或在达到100,000行后停止计数,以先到者为准:

SELECT *
FROM my_table
LIMIT -100000;

使用以下查询来确定my_table表中的行数是否大于100,000:

SELECT COUNT(*)
FROM my_table
WHERE some_id = '....'
LIMIT -100000;

如果行数大于100,000,COUNT()函数将返回大于100,000的值。使用此值来确定是否运行后续查询。

例如,以下代码将在行数大于100,000时运行后续查询:

if (COUNT(*) > 100000) {
   -- 运行后续查询
}
英文:

Make Clickhouse stop counting after reaching a certain threshold using the LIMIT clause. Lets you specify the maximum number of rows that should be returned by the query. Below query will return the first 100 rows from the my_table table:

SELECT *
FROM my_table
LIMIT 100;

To stop counting after a certain threshold, use the LIMIT clause with a negative number. following query will return the first 100 rows from the my_table table, or stop counting after reaching 100,000 rows, whichever comes first:

SELECT *
FROM my_table
LIMIT -100000;

Use the following query to determine whether the number of rows in the my_table table is greater than 100,000:

SELECT COUNT(*)
FROM my_table
WHERE some_id = &#39;....&#39;
LIMIT -100000;

If the number of rows is greater than 100,000, the COUNT() function will return a value greater than 100,000. use this value to determine whether to run the follow-up query.

e.g., the following code will run the follow-up query if the number of rows is greater than 100,000:

if (COUNT(*) &gt; 100000) {
   -- run follow-up query
}

huangapple
  • 本文由 发表于 2023年6月12日 22:39:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76457741.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定