如何在Databricks中使用SQL计算表格的列数

huangapple go评论38阅读模式
英文:

How to count the columns from tables in Databricks in SQL

问题

以下是翻译好的部分:

"Please, how can I count the number of columns from tables in a SQL notebook in Databricks? The information_schema.columns is not working."
请问,我如何在Databricks的SQL笔记本中计算表格的列数?information_schema.columns无法使用。

"I was trying to run the query below:"
我尝试运行下面的查询:

"Expected Results are:"
期望的结果如下:

英文:

Please, how can I count the number of columns from tables in a SQL notebook in Databricks? The information_schema.columns is not working.

I was trying to run the query below:

select 'data_beat' Table_Name
         ,max(_createTime_) as Last_Create_Date
         ,max(_upt_) as Last_Update_Date 
         ,**(select count(*)
            from information_schema.columns
           where Table_name = 'data_beat'
          ) as Column_count**
		  ,(select count(*)
				from data_lineage.field_dependencies
				where source like 'ett.data_beat%'
		) as Count_to_L1
     from ett.data_beat

Expected Results are:

| Table_Name   | Last_Update_Date | Last_Create_Date | Columns_Count | Count_to_L1 |
| --------     | --------         |
|ett.data_beat | 5/10/2023        | 5/10/2023        | 51            |12           |

答案1

得分: 0

I tried in my environment and got same error.

根据此文档和我的理解,INFORMATION_SCHEMA 需要创建一个统一的目录,然后它将提供该统一目录下表的详细信息。这可能是它无法识别 INFORMATION_SCHEMA 的原因。

因此,请确保您的表位于一个统一的目录下,以便使用 INFORMATION_SCHEMA

另一种解决方法是在 SQL 单元格之间使用一些 Pyspark 代码将列的详细信息存储在临时视图中,并使用它来获取计数。

我的示例:

%python
df=spark.sql('show  columns  from sample1')
df.createOrReplaceTempView("columns_sample1")
%sql
select count(*) as count from columns_sample1;

如何在Databricks中使用SQL计算表格的列数

英文:

I tried in my environment and got same error.

如何在Databricks中使用SQL计算表格的列数

As per this Documentation, and as far as I understood, INFORMATION_SCHEMA requires unity catalogue to be created and it will give the details for those tables under that unity catalogue. This might be reason for it to not identifying INFORMATION_SCHEMA.

So, make sure your tables are under a unity catalogue to use INFORMATION_SCHEMA.

And another workaround can be using some Pyspark code in between the SQL cells to store the column details in a temporary view and using it to get the count.

My Sample:

%python
df=spark.sql('show  columns  from sample1')
df.createOrReplaceTempView("columns_sample1")
%sql
select count(*) as count from columns_sample1;

如何在Databricks中使用SQL计算表格的列数

huangapple
  • 本文由 发表于 2023年5月10日 20:05:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76218194.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定