Databricks:SQL透视不起作用 – 但Python可以

huangapple go评论61阅读模式
英文:

Databricks: SQL Pivot does not work - but Python does

问题

I am trying to pivot a SQL table in Databricks on the FeatureName column using the FeatureDescription column as the value.

但是某种方式,我所拥有的SQL查询结果是在整个表格中填充了空值。

Here is the original table:

这是原始表格:

AddressKey FeatureName FeatureDescription
11101 Groups AG WAREHOUSE
11101 National Groups AG WAREHOUSE
5528 National Groups AIRR
5528 Groups AUSTRALIAN INDEPENDENT RURAL RESELLERS

Here is the SQL query:

以下是SQL查询:

SELECT *
FROM
  (SELECT AddressKey, FeatureName, FeatureDescription
   FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures) src
PIVOT (
  max(FeatureDescription)
  FOR FeatureName IN (
    'Carcasspricecategory' AS Carcasspricecategory,
    'Channel' AS Channel,
    'Class' AS Class,
    'Contractor' AS Contractor,
    'DebtorCategory' AS DebtorCategory,
    'DomesticCustTradingasExport' AS DomesticCustTradingasExport,
    'FarmingOperations' AS FarmingOperations,
    'FarmingProductionType' AS FarmingProductionType,
    'FeatureName' AS FeatureName,
    'FMInternalRegions' AS FMInternalRegions,
    'Groups' AS Groups,
    'MillITSupplierType' AS MillITSupplierType,
    'NationalGroups' AS NationalGroups,
    'PaymentMethod' AS PaymentMethod,
    'PurchaseGroup' AS PurchaseGroup,
    'RegisteredforGST' AS RegisteredforGST,
    'SalesGroup' AS SalesGroup,
    'SiteBaxters' AS SiteBaxters,
    'SiteCorowaMeats' AS SiteCorowaMeats,
    'SiteDVP' AS SiteDVP,
    'SiteFeedmill' AS SiteFeedmill,
    'SiteNSWWholesale' AS SiteNSWWholesale,
    'Type' AS Type
  )
);

The result:

结果:

AddressKey Carcass price category Channel Class Contractor Debtor Category Domestic Cust Trading as Export FM Internal Regions Farming Operations Farming Production Type Payment Method Purchase Group Registered for GST Sales Group Site Baxters Site Corowa Meats Site DVP Site Feedmill Site NSW Wholesale Type
9 null null null null null null null null null null null null null null null null null null null
24 null null null null null null null null null null null null null null null null null null null
1000 null null null null null null null null null null null null null null null null null null null

All null values for all columns across the table.

在整个表格中,所有列的值都是null。

I tried using any, first, max, etc. for the aggregate function. Same result.

我尝试使用anyfirstmax等聚合函数,但结果都是一样的。

But it works why I do the same thing using Python:

但是当我使用Python做同样的事情时,它可以工作:

%python
import pandas as pd

df = spark.sql('SELECT * FROM  radp_dev_cat.rdp_svr_sales.vw_customerfeatures')

df_pivot = df.groupBy('AddressKey').pivot('FeatureName').agg({"FeatureDescription": "first"}).toPandas()

df_pivot
AddressKey Carcass price category Channel Class Contractor Debtor Category Domestic Cust Trading as Export FM Internal Regions Farming Operations Farming Production Type Payment Method Purchase Group Registered for GST Sales Group Site Baxters Site Corowa Meats Site DVP Site Feedmill Site NSW Wholesale Type
9 None MEAT WHOLESALE MEAT None None None None None None EFT GRAIN GROWERS No None No No No Yes No BUTCHER
24 None RETAIL MEAT None None None None None None EFT None None None None None None None None BUTCHER
Why does this happen? Is something wrong with my SQL query?

为什么会发生这种情况?我的SQL查询有问题吗?

Thanks.

英文:

I am trying to pivot a SQL table in Databricks on the FeatureName column using the FeatureDescription column as the value.

But somehow the SQL query that I have resolves to null values being populated across the table.

Here is the original table:

AddressKey FeatureName FeatureDescription
11101 Groups AG WAREHOUSE
11101 National Groups AG WAREHOUSE
5528 National Groups AIRR
5528 Groups AUSTRALIAN INDEPENDENT RURAL RESELLERS

Here is the SQL query:

SELECT *
FROM
  (SELECT AddressKey, FeatureName, FeatureDescription
   FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures) src
PIVOT (
  max(FeatureDescription)
  FOR FeatureName IN (
    'Carcasspricecategory' AS Carcasspricecategory,
    'Channel' AS Channel,
    'Class' AS Class,
    'Contractor' AS Contractor,
    'DebtorCategory' AS DebtorCategory,
    'DomesticCustTradingasExport' AS DomesticCustTradingasExport,
    'FarmingOperations' AS FarmingOperations,
    'FarmingProductionType' AS FarmingProductionType,
    'FeatureName' AS FeatureName,
    'FMInternalRegions' AS FMInternalRegions,
    'Groups' AS Groups,
    'MillITSupplierType' AS MillITSupplierType,
    'NationalGroups' AS NationalGroups,
    'PaymentMethod' AS PaymentMethod,
    'PurchaseGroup' AS PurchaseGroup,
    'RegisteredforGST' AS RegisteredforGST,
    'SalesGroup' AS SalesGroup,
    'SiteBaxters' AS SiteBaxters,
    'SiteCorowaMeats' AS SiteCorowaMeats,
    'SiteDVP' AS SiteDVP,
    'SiteFeedmill' AS SiteFeedmill,
    'SiteNSWWholesale' AS SiteNSWWholesale,
    'Type' AS Type
  )
);

The result:

AddressKey Carcass price category Channel Class Contractor Debtor Category Domestic Cust Trading as Export FM Internal Regions Farming Operations Farming Production Type Payment Method Purchase Group Registered for GST Sales Group Site Baxters Site Corowa Meats Site DVP Site Feedmill Site NSW Wholesale Type
9 null null null null null null null null null null null null null null null null null null null
24 null null null null null null null null null null null null null null null null null null null
1000 null null null null null null null null null null null null null null null null null null null

All null values for all columns across the table.

I tried using any, first, max, etc. for the aggregate function. Same result.

But it works why I do the same thing using Python:

%python
import pandas as pd

df = spark.sql('SELECT * FROM  radp_dev_cat.rdp_svr_sales.vw_customerfeatures')

df_pivot = df.groupBy('AddressKey').pivot('FeatureName').agg({"FeatureDescription": "first"}).toPandas()

df_pivot
AddressKey Carcass price category Channel Class Contractor Debtor Category Domestic Cust Trading as Export FM Internal Regions Farming Operations Farming Production Type Payment Method Purchase Group Registered for GST Sales Group Site Baxters Site Corowa Meats Site DVP Site Feedmill Site NSW Wholesale Type
9 None MEAT WHOLESALE MEAT None None None None None None EFT GRAIN GROWERS No None No No No Yes No BUTCHER
24 None RETAIL MEAT None None None None None None EFT None None None None None None None None BUTCHER

Why does this happen? Is something wrong with my SQL query?

Thanks.

答案1

得分: 0

问题是具有FeatureName、FeatureDescription数据的底层视图存在问题。记录中的列有尾随的空白字符,因此没有一个PIVOT列匹配。

FeatureName len(FeatureName) FeatureDescription len(FeatureDescription)
Debtor Category 50 TRADE 70
Type 50 DISTRIBUTOR 70
Class 50 LOCAL 70

我修整了底层视图,问题得以解决。

感谢 @nbk 提供的见解。

英文:

The problem was the underlying view that had the FeatureName, FeatureDescription data. The columns had trailing whitespace in the records so none of the PIVOT columns matched.

FeatureName len(FeatureName) FeatureDescription len(FeatureDescription)
Debtor Category 50 TRADE 70
Type 50 DISTRIBUTOR 70
Class 50 LOCAL 70

I trimmed the underlying view and it worked.

Thanks for the insight @nbk.

huangapple
  • 本文由 发表于 2023年6月22日 14:01:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76528962.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定