2023年6月12日 22:39:58go评论82阅读模式

英文:

Can I write Clickhouse SQL query that stops counting after a certain limit is reached?

问题

我有一个包含大量行的表格。如果行数大于100,000，我想执行不同的后续查询。当然，我可以执行一个简单的COUNT来确定行数。但是，这将一直计数直到结束（可能有数百万行），而我只想知道计数是否超过100,000。以下是简单的查询示例：

SELECT COUNT(*)
FROM my_table
WHERE some_id = '....';

我可以让Clickhouse在达到一定阈值后停止计数吗？

英文:

I have a table with a lot of rows. If the number of rows is greater than 100,000, I want to a different follow-up query. I can, of course, do a simple COUNT to determine the number of rows. However, that will keep on counting until the end (maybe millions of rows) and I only want to know whether the count is more than 100,000. Here is the naive query:

SELECT COUNT(*)
FROM my_table
WHERE some_id = &#39;....&#39;

Can I make Clickhouse stop counting after reaching a certain threshold?

I already asked ChatGPT but it came back with illegal SQL. When I told it so, it came back with another query that did not work.

答案1

得分: 1

以下是翻译好的部分：

设置max_rows_to_read可以停止读取MergeTree表。

261,636 <> 20,000，因为ClickHouse每次读取65,000行并使用多线程。

25000 <> 20000，因为ClickHouse每次读取25,000行并使用多线程。

英文:

https://clickhouse.com/docs/en/operations/settings/query-complexity#max-rows-to-read

The setting max_rows_to_read can stop the reading of MergeTree table.

create table T ( A Int64, B Int64) 
Engine=MergeTree order by (A,B) 
as select 1, number from numbers(1e8);

select count() from T where A=1;
┌───count()─┐
│ 100000000 │
└───────────┘
1 row in set. Elapsed: 0.128 sec. Processed 100.00 million rows, 800.00 MB (783.02 million rows/s., 6.26 GB/s.)


select count() from T where A=1 
settings max_rows_to_read=20000, read_overflow_mode=&#39;break&#39;;
┌─count()─┐
│  261636 │
└─────────┘
1 row in set. Elapsed: 0.006 sec. Processed 261.64 thousand rows, 2.09 MB (41.13 million rows/s., 329.01 MB/s.)

261636 <> 20000 because Clickhouse reads by 65k rows and using multiple threads

select count() from T where A=1 
settings max_rows_to_read=20000, read_overflow_mode=&#39;break&#39;,
max_threads=1, max_block_size=1000;
┌─count()─┐
│   25000 │
└─────────┘
1 row in set. Elapsed: 0.005 sec. Processed 25.00 thousand rows, 200.00 KB (5.54 million rows/s., 44.34 MB/s.)

答案2

得分: 1

使用LIMIT子句可以在达到特定的阈值后使Clickhouse停止计数。它允许您指定查询应返回的最大行数。以下查询将从my_table表返回前100行：

SELECT *
FROM my_table
LIMIT 100;

要在达到特定的阈值后停止计数，请使用带有负数的LIMIT子句。以下查询将从my_table表返回前100行，或在达到100,000行后停止计数，以先到者为准：

SELECT *
FROM my_table
LIMIT -100000;

使用以下查询来确定my_table表中的行数是否大于100,000：

SELECT COUNT(*)
FROM my_table
WHERE some_id = '....'
LIMIT -100000;

如果行数大于100,000，COUNT()函数将返回大于100,000的值。使用此值来确定是否运行后续查询。

例如，以下代码将在行数大于100,000时运行后续查询：

if (COUNT(*) > 100000) {
   -- 运行后续查询
}

英文:

Make Clickhouse stop counting after reaching a certain threshold using the LIMIT clause. Lets you specify the maximum number of rows that should be returned by the query. Below query will return the first 100 rows from the my_table table:

SELECT *
FROM my_table
LIMIT 100;

To stop counting after a certain threshold, use the LIMIT clause with a negative number. following query will return the first 100 rows from the my_table table, or stop counting after reaching 100,000 rows, whichever comes first:

SELECT *
FROM my_table
LIMIT -100000;

Use the following query to determine whether the number of rows in the my_table table is greater than 100,000:

SELECT COUNT(*)
FROM my_table
WHERE some_id = &#39;....&#39;
LIMIT -100000;

If the number of rows is greater than 100,000, the COUNT() function will return a value greater than 100,000. use this value to determine whether to run the follow-up query.

e.g., the following code will run the follow-up query if the number of rows is greater than 100,000:

if (COUNT(*) &gt; 100000) {
   -- run follow-up query
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

可以我编写Clickhouse SQL查询，在达到某个限制后停止计数吗？

问题

答案1

答案2

Oracle PLSQL长时间运行问题

如何找到在相同的主键内，列A的值不同但列B的值相同。

PLS-00103: 遇到符号 “end-of-file”，预期是以下之一：begin end function pragma procedure

SQL：考虑到合并的分支，计算分支增长百分比

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论