英文:
Databricks: SQL Pivot does not work - but Python does
问题
I am trying to pivot a SQL table in Databricks on the FeatureName
column using the FeatureDescription
column as the value.
但是某种方式,我所拥有的SQL查询结果是在整个表格中填充了空值。
Here is the original table:
这是原始表格:
AddressKey | FeatureName | FeatureDescription |
---|---|---|
11101 | Groups | AG WAREHOUSE |
11101 | National Groups | AG WAREHOUSE |
5528 | National Groups | AIRR |
5528 | Groups | AUSTRALIAN INDEPENDENT RURAL RESELLERS |
Here is the SQL query:
以下是SQL查询:
SELECT *
FROM
(SELECT AddressKey, FeatureName, FeatureDescription
FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures) src
PIVOT (
max(FeatureDescription)
FOR FeatureName IN (
'Carcasspricecategory' AS Carcasspricecategory,
'Channel' AS Channel,
'Class' AS Class,
'Contractor' AS Contractor,
'DebtorCategory' AS DebtorCategory,
'DomesticCustTradingasExport' AS DomesticCustTradingasExport,
'FarmingOperations' AS FarmingOperations,
'FarmingProductionType' AS FarmingProductionType,
'FeatureName' AS FeatureName,
'FMInternalRegions' AS FMInternalRegions,
'Groups' AS Groups,
'MillITSupplierType' AS MillITSupplierType,
'NationalGroups' AS NationalGroups,
'PaymentMethod' AS PaymentMethod,
'PurchaseGroup' AS PurchaseGroup,
'RegisteredforGST' AS RegisteredforGST,
'SalesGroup' AS SalesGroup,
'SiteBaxters' AS SiteBaxters,
'SiteCorowaMeats' AS SiteCorowaMeats,
'SiteDVP' AS SiteDVP,
'SiteFeedmill' AS SiteFeedmill,
'SiteNSWWholesale' AS SiteNSWWholesale,
'Type' AS Type
)
);
The result:
结果:
AddressKey | Carcass price category | Channel | Class | Contractor | Debtor Category | Domestic Cust Trading as Export | FM Internal Regions | Farming Operations | Farming Production Type | Payment Method | Purchase Group | Registered for GST | Sales Group | Site Baxters | Site Corowa Meats | Site DVP | Site Feedmill | Site NSW Wholesale | Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
24 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
1000 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
All null values for all columns across the table.
在整个表格中,所有列的值都是null。
I tried using any
, first
, max
, etc. for the aggregate function. Same result.
我尝试使用any
、first
、max
等聚合函数,但结果都是一样的。
But it works why I do the same thing using Python:
但是当我使用Python做同样的事情时,它可以工作:
%python
import pandas as pd
df = spark.sql('SELECT * FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures')
df_pivot = df.groupBy('AddressKey').pivot('FeatureName').agg({"FeatureDescription": "first"}).toPandas()
df_pivot
AddressKey | Carcass price category | Channel | Class | Contractor | Debtor Category | Domestic Cust Trading as Export | FM Internal Regions | Farming Operations | Farming Production Type | Payment Method | Purchase Group | Registered for GST | Sales Group | Site Baxters | Site Corowa Meats | Site DVP | Site Feedmill | Site NSW Wholesale | Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | None | MEAT WHOLESALE | MEAT | None | None | None | None | None | None | EFT | GRAIN GROWERS | No | None | No | No | No | Yes | No | BUTCHER |
24 | None | RETAIL | MEAT | None | None | None | None | None | None | EFT | None | None | None | None | None | None | None | None | BUTCHER |
Why does this happen? Is something wrong with my SQL query? |
为什么会发生这种情况?我的SQL查询有问题吗?
Thanks.
英文:
I am trying to pivot a SQL table in Databricks on the FeatureName
column using the FeatureDescription
column as the value.
But somehow the SQL query that I have resolves to null values being populated across the table.
Here is the original table:
AddressKey | FeatureName | FeatureDescription |
---|---|---|
11101 | Groups | AG WAREHOUSE |
11101 | National Groups | AG WAREHOUSE |
5528 | National Groups | AIRR |
5528 | Groups | AUSTRALIAN INDEPENDENT RURAL RESELLERS |
Here is the SQL query:
SELECT *
FROM
(SELECT AddressKey, FeatureName, FeatureDescription
FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures) src
PIVOT (
max(FeatureDescription)
FOR FeatureName IN (
'Carcasspricecategory' AS Carcasspricecategory,
'Channel' AS Channel,
'Class' AS Class,
'Contractor' AS Contractor,
'DebtorCategory' AS DebtorCategory,
'DomesticCustTradingasExport' AS DomesticCustTradingasExport,
'FarmingOperations' AS FarmingOperations,
'FarmingProductionType' AS FarmingProductionType,
'FeatureName' AS FeatureName,
'FMInternalRegions' AS FMInternalRegions,
'Groups' AS Groups,
'MillITSupplierType' AS MillITSupplierType,
'NationalGroups' AS NationalGroups,
'PaymentMethod' AS PaymentMethod,
'PurchaseGroup' AS PurchaseGroup,
'RegisteredforGST' AS RegisteredforGST,
'SalesGroup' AS SalesGroup,
'SiteBaxters' AS SiteBaxters,
'SiteCorowaMeats' AS SiteCorowaMeats,
'SiteDVP' AS SiteDVP,
'SiteFeedmill' AS SiteFeedmill,
'SiteNSWWholesale' AS SiteNSWWholesale,
'Type' AS Type
)
);
The result:
AddressKey | Carcass price category | Channel | Class | Contractor | Debtor Category | Domestic Cust Trading as Export | FM Internal Regions | Farming Operations | Farming Production Type | Payment Method | Purchase Group | Registered for GST | Sales Group | Site Baxters | Site Corowa Meats | Site DVP | Site Feedmill | Site NSW Wholesale | Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
24 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
1000 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
All null values for all columns across the table.
I tried using any
, first
, max
, etc. for the aggregate function. Same result.
But it works why I do the same thing using Python:
%python
import pandas as pd
df = spark.sql('SELECT * FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures')
df_pivot = df.groupBy('AddressKey').pivot('FeatureName').agg({"FeatureDescription": "first"}).toPandas()
df_pivot
AddressKey | Carcass price category | Channel | Class | Contractor | Debtor Category | Domestic Cust Trading as Export | FM Internal Regions | Farming Operations | Farming Production Type | Payment Method | Purchase Group | Registered for GST | Sales Group | Site Baxters | Site Corowa Meats | Site DVP | Site Feedmill | Site NSW Wholesale | Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | None | MEAT WHOLESALE | MEAT | None | None | None | None | None | None | EFT | GRAIN GROWERS | No | None | No | No | No | Yes | No | BUTCHER |
24 | None | RETAIL | MEAT | None | None | None | None | None | None | EFT | None | None | None | None | None | None | None | None | BUTCHER |
Why does this happen? Is something wrong with my SQL query?
Thanks.
答案1
得分: 0
问题是具有FeatureName、FeatureDescription数据的底层视图存在问题。记录中的列有尾随的空白字符,因此没有一个PIVOT列匹配。
FeatureName | len(FeatureName) | FeatureDescription | len(FeatureDescription) |
---|---|---|---|
Debtor Category | 50 | TRADE | 70 |
Type | 50 | DISTRIBUTOR | 70 |
Class | 50 | LOCAL | 70 |
我修整了底层视图,问题得以解决。
感谢 @nbk 提供的见解。
英文:
The problem was the underlying view that had the FeatureName, FeatureDescription data. The columns had trailing whitespace in the records so none of the PIVOT columns matched.
FeatureName | len(FeatureName) | FeatureDescription | len(FeatureDescription) |
---|---|---|---|
Debtor Category | 50 | TRADE | 70 |
Type | 50 | DISTRIBUTOR | 70 |
Class | 50 | LOCAL | 70 |
I trimmed the underlying view and it worked.
Thanks for the insight @nbk.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论