2023年6月22日 14:01:35go评论65阅读模式

英文:

Databricks: SQL Pivot does not work - but Python does

问题

I am trying to pivot a SQL table in Databricks on the FeatureName column using the FeatureDescription column as the value.

但是某种方式，我所拥有的SQL查询结果是在整个表格中填充了空值。

Here is the original table:

这是原始表格：

AddressKey	FeatureName	FeatureDescription
11101	Groups	AG WAREHOUSE
11101	National Groups	AG WAREHOUSE
5528	National Groups	AIRR
5528	Groups	AUSTRALIAN INDEPENDENT RURAL RESELLERS

Here is the SQL query:

以下是SQL查询：

SELECT *
FROM
  (SELECT AddressKey, FeatureName, FeatureDescription
   FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures) src
PIVOT (
  max(FeatureDescription)
  FOR FeatureName IN (
    'Carcasspricecategory' AS Carcasspricecategory,
    'Channel' AS Channel,
    'Class' AS Class,
    'Contractor' AS Contractor,
    'DebtorCategory' AS DebtorCategory,
    'DomesticCustTradingasExport' AS DomesticCustTradingasExport,
    'FarmingOperations' AS FarmingOperations,
    'FarmingProductionType' AS FarmingProductionType,
    'FeatureName' AS FeatureName,
    'FMInternalRegions' AS FMInternalRegions,
    'Groups' AS Groups,
    'MillITSupplierType' AS MillITSupplierType,
    'NationalGroups' AS NationalGroups,
    'PaymentMethod' AS PaymentMethod,
    'PurchaseGroup' AS PurchaseGroup,
    'RegisteredforGST' AS RegisteredforGST,
    'SalesGroup' AS SalesGroup,
    'SiteBaxters' AS SiteBaxters,
    'SiteCorowaMeats' AS SiteCorowaMeats,
    'SiteDVP' AS SiteDVP,
    'SiteFeedmill' AS SiteFeedmill,
    'SiteNSWWholesale' AS SiteNSWWholesale,
    'Type' AS Type
  )
);

The result:

结果：

AddressKey	Carcass price category	Channel	Class	Contractor	Debtor Category	Domestic Cust Trading as Export	FM Internal Regions	Farming Operations	Farming Production Type	Payment Method	Purchase Group	Registered for GST	Sales Group	Site Baxters	Site Corowa Meats	Site DVP	Site Feedmill	Site NSW Wholesale	Type
9	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null
24	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null
1000	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null

All null values for all columns across the table.

在整个表格中，所有列的值都是null。

I tried using any, first, max, etc. for the aggregate function. Same result.

我尝试使用any、first、max等聚合函数，但结果都是一样的。

But it works why I do the same thing using Python:

但是当我使用Python做同样的事情时，它可以工作：

%python
import pandas as pd

df = spark.sql('SELECT * FROM  radp_dev_cat.rdp_svr_sales.vw_customerfeatures')

df_pivot = df.groupBy('AddressKey').pivot('FeatureName').agg({"FeatureDescription": "first"}).toPandas()

df_pivot

AddressKey	Carcass price category	Channel	Class	Contractor	Debtor Category	Domestic Cust Trading as Export	FM Internal Regions	Farming Operations	Farming Production Type	Payment Method	Purchase Group	Registered for GST	Sales Group	Site Baxters	Site Corowa Meats	Site DVP	Site Feedmill	Site NSW Wholesale	Type
9	None	MEAT WHOLESALE	MEAT	None	None	None	None	None	None	EFT	GRAIN GROWERS	No	None	No	No	No	Yes	No	BUTCHER
24	None	RETAIL	MEAT	None	None	None	None	None	None	EFT	None	None	None	None	None	None	None	None	BUTCHER

Why does this happen? Is something wrong with my SQL query?

为什么会发生这种情况？我的SQL查询有问题吗？

Thanks.

英文:

I am trying to pivot a SQL table in Databricks on the FeatureName column using the FeatureDescription column as the value.

But somehow the SQL query that I have resolves to null values being populated across the table.

Here is the original table:

AddressKey	FeatureName	FeatureDescription
11101	Groups	AG WAREHOUSE
11101	National Groups	AG WAREHOUSE
5528	National Groups	AIRR
5528	Groups	AUSTRALIAN INDEPENDENT RURAL RESELLERS

Here is the SQL query:

SELECT *
FROM
  (SELECT AddressKey, FeatureName, FeatureDescription
   FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures) src
PIVOT (
  max(FeatureDescription)
  FOR FeatureName IN (
    &#39;Carcasspricecategory&#39; AS Carcasspricecategory,
    &#39;Channel&#39; AS Channel,
    &#39;Class&#39; AS Class,
    &#39;Contractor&#39; AS Contractor,
    &#39;DebtorCategory&#39; AS DebtorCategory,
    &#39;DomesticCustTradingasExport&#39; AS DomesticCustTradingasExport,
    &#39;FarmingOperations&#39; AS FarmingOperations,
    &#39;FarmingProductionType&#39; AS FarmingProductionType,
    &#39;FeatureName&#39; AS FeatureName,
    &#39;FMInternalRegions&#39; AS FMInternalRegions,
    &#39;Groups&#39; AS Groups,
    &#39;MillITSupplierType&#39; AS MillITSupplierType,
    &#39;NationalGroups&#39; AS NationalGroups,
    &#39;PaymentMethod&#39; AS PaymentMethod,
    &#39;PurchaseGroup&#39; AS PurchaseGroup,
    &#39;RegisteredforGST&#39; AS RegisteredforGST,
    &#39;SalesGroup&#39; AS SalesGroup,
    &#39;SiteBaxters&#39; AS SiteBaxters,
    &#39;SiteCorowaMeats&#39; AS SiteCorowaMeats,
    &#39;SiteDVP&#39; AS SiteDVP,
    &#39;SiteFeedmill&#39; AS SiteFeedmill,
    &#39;SiteNSWWholesale&#39; AS SiteNSWWholesale,
    &#39;Type&#39; AS Type
  )
);

The result:

AddressKey	Carcass price category	Channel	Class	Contractor	Debtor Category	Domestic Cust Trading as Export	FM Internal Regions	Farming Operations	Farming Production Type	Payment Method	Purchase Group	Registered for GST	Sales Group	Site Baxters	Site Corowa Meats	Site DVP	Site Feedmill	Site NSW Wholesale	Type
9	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null
24	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null
1000	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null

All null values for all columns across the table.

I tried using any, first, max, etc. for the aggregate function. Same result.

But it works why I do the same thing using Python:

%python
import pandas as pd

df = spark.sql(&#39;SELECT * FROM  radp_dev_cat.rdp_svr_sales.vw_customerfeatures&#39;)

df_pivot = df.groupBy(&#39;AddressKey&#39;).pivot(&#39;FeatureName&#39;).agg({&quot;FeatureDescription&quot;: &quot;first&quot;}).toPandas()

df_pivot

AddressKey	Carcass price category	Channel	Class	Contractor	Debtor Category	Domestic Cust Trading as Export	FM Internal Regions	Farming Operations	Farming Production Type	Payment Method	Purchase Group	Registered for GST	Sales Group	Site Baxters	Site Corowa Meats	Site DVP	Site Feedmill	Site NSW Wholesale	Type
9	None	MEAT WHOLESALE	MEAT	None	None	None	None	None	None	EFT	GRAIN GROWERS	No	None	No	No	No	Yes	No	BUTCHER
24	None	RETAIL	MEAT	None	None	None	None	None	None	EFT	None	None	None	None	None	None	None	None	BUTCHER

Why does this happen? Is something wrong with my SQL query?

Thanks.

答案1

得分: 0

问题是具有FeatureName、FeatureDescription数据的底层视图存在问题。记录中的列有尾随的空白字符，因此没有一个PIVOT列匹配。

FeatureName	len(FeatureName)	FeatureDescription	len(FeatureDescription)
Debtor Category	50	TRADE	70
Type	50	DISTRIBUTOR	70
Class	50	LOCAL	70

我修整了底层视图，问题得以解决。

感谢 @nbk 提供的见解。

英文:

The problem was the underlying view that had the FeatureName, FeatureDescription data. The columns had trailing whitespace in the records so none of the PIVOT columns matched.

FeatureName	len(FeatureName)	FeatureDescription	len(FeatureDescription)
Debtor Category	50	TRADE	70
Type	50	DISTRIBUTOR	70
Class	50	LOCAL	70

I trimmed the underlying view and it worked.

Thanks for the insight @nbk.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Databricks：SQL透视不起作用 – 但Python可以

问题

答案1

How can I write an SQL query that (1) groups by average and (2) adds a new column with the average of the entire dataset

SQL子查询问题 – 无效的对象名称

SQL 中获取最后两位数字

返回具有每列多个条目的条目。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论