英文:
Month over Month querying in multiple dimension tables
问题
Here's the translated portion of your text:
目前,我们有一个数据仓库(Azure SQL Server),我正在连接到它。问题是:我需要按地区分开,获取空置物业的逐月计数。物业可以分为多户(多个门)或单户(一个门)。如果一个物业在一个月内空置,然后继续在下个月空置,我们需要进行计数。有可能记录会保持空闲,所以我将无法在数据库中找到与后一个月关联的行
简化的上下文如下:
tblPWDimBuilding
BuildingID | Status | RowEffectiveDate | Organization Name |
---|---|---|---|
1234 | Occupied | 2023-05-17 | Cleveland |
1235 | Vacant | 2023-04-17 | Cleveland |
1236 | Occupied | 2023-05-17 | Cleveland |
1237 | Vacant | 2023-05-17 | Cleveland |
1238 | Vacant | 2023-03-17 | Cleveland |
tblPWDimUnit
UnitId | Status | RowEffectiveDate | Organization Name | BuildingID |
---|---|---|---|---|
2123 | Occupied | 2023-05-17 | Cleveland | 1238 |
2124 | Vacant | 2023-04-17 | Cleveland | 1238 |
RowEffectiveDate - 此列指定记录从特定日期开始生效,因为维度表中有历史数据。
RowExpirationDate - 这也是用于历史数据的列。此列指定记录在特定日期到期,即不活动记录,对于活动记录,该值为null。
RowIsActive - 1表示活动记录,0表示不活动记录(历史数据)。
从上述表格中,期望的结果如下:
年份 | 月份 | 组织名称 | 空置物业计数 |
---|---|---|---|
2023 | 04 | Cleveland | 2 |
2023 | 05 | Cleveland | 3 (不是1) |
(请注意,对于多户房产,忽略了房产的状态,而是考虑了单位)
在四月份,我们有单户房产ID为1235的结果,以及多户房产,其中一个门是1238,单位ID为2124。进入五月份,我们在建筑物或单位的DIM表中没有新的记录,因此四月份的2扇门在五月份仍然是空置的,需要考虑进去,还有新的空置建筑物ID为1237。这个过程需要在审查过去6-12个月时使用,因此我们不能只看上个月的期间。我不确定如何模拟这种类似瀑布的计数过程。我的不成功尝试如下:
在上面的代码中,我在单位级别上将单户房产和多户房产的视图联合起来。然后,我按照记录在数据库中启用的年份和月份以及物业所在的组织或地区进行分组。Active=True标志是特定于源系统的标志,用于确保它没有被停用。
上面的方法接近,但在使用行生效日期时,无法考虑到需要在下个月反映的空置物业的情况,而且已经处于空闲状态(上个月的记录将被标记为活动状态)。分组将不再正确工作。我陷入困境!如何解决这个问题?我可能在哪里出错了?
If you have any specific questions or need further assistance with this issue, please let me know.
英文:
Currently, we have a data warehouse (Azure SQL Server) that I am connecting to. The problem: I need to have month-over-month counts of vacant properties separated by the region they are located. Properties can be categorized as multi-family (more than one door), or single-family (one door). If a property is vacant in one month and continues to be in the next month, we need to count it. There is a possibility the record will remain idle so I will not have a row in the DB that I can associate with the later month
Context (simplified):
tblPWDimBuilding
BuildingID | Status | RowEffectiveDate | Organization Name |
---|---|---|---|
1234 | Occupied | 2023-05-17 | Cleveland |
1235 | Vacant | 2023-04-17 | Cleveland |
1236 | Occupied | 2023-05-17 | Cleveland |
1237 | Vacant | 2023-05-17 | Cleveland |
1238 | Vacant | 2023-03-17 | Cleveland |
tblPWDimUnit
UnitId | Status | RowEffectiveDate | Organization Name | BuildingID |
---|---|---|---|---|
2123 | Occupied | 2023-05-17 | Cleveland | 1238 |
2124 | Vacant | 2023-04-17 | Cleveland | 1238 |
RowEffectiveDate – As we have historical data in dim tables, this column specifies that the record is active from this particular date.
RowExpirationDate – This is again for historical data. This column specifies that record got expired on that particular date i.e. inactive record, for active records the value is null.
RowIsActive - 1 signifies active record whereas 0 signifies inactive record (Historical data).
The expected Results from the table above:
Year | Month | Organization Name | Vacant Count |
---|---|---|---|
2023 | 04 | Cleveland | 2 |
2023 | 05 | Cleveland | 3 (not 1) |
(notice the status of the property on a multi-family is ignored and the unit is considered instead)
Where in April, we have results from the single-family home with ID 1235, and from the multi-family home, where one door is vacant with building Id of 1238 and unit ID of 2124. Moving into May, we don't have any new records in the DIM table for buildings or units so the 2 doors from April are still vacant in May and need to be accounted for, along with the newly vacant building with ID 1237. This process will need to be used when reviewing the trailing 6-12 months so we can't just look at the last month's period. I am not sure how to emulate this waterfall like counting process. My unsuccessful attempt is below:
SELECT
DATEPART(YEAR, RowEffectiveDate) AS Year,
DATEPART(MONTH, RowEffectiveDate) As Month,
OrganizationName,
COUNT(DISTINCT b.BuildingId) AS [Vacant Count]
FROM curated.tblPWDimBuilding b
WHERE b.Status = 'Vacant'
AND b.Active = 'True'
GROUP BY
DATEPART(YEAR, RowEffectiveDate),
DATEPART(MONTH, RowEffectiveDate),
OrganizationName
UNION
SELECT
DATEPART(YEAR, u.RowEffectiveDate) AS Year,
DATEPART(MONTH, u.RowEffectiveDate) As Month,
u.OrganizationName,
COUNT(DISTINCT u.UnitID) AS [Vacant Count]
FROM curated.tblPWDimBuilding b
INNER JOIN curated.tblPWDimUnits u ON b.BuildingId = u.BuildingId
AND u.Status = 'Vacant'
WHERE b.Active = 'True'
GROUP BY
DATEPART(YEAR, u.RowEffectiveDate),
DATEPART(MONTH, u.RowEffectiveDate),
u.OrganizationName;
In the code above I have a union between the single-family perspective and the multi-family perspective at the unit level. I then group by the year and month of when the record was enabled in the database, and group by the organization or region that the property is located. The Active=True flag is a flag specific to the source system to be sure it wasn't deactivated.
The above is close, but when using the row effective date it fails to account for the case where a vacant property needs to be reflected in the next month and has been idle (where the record last month will be the one marked as active) The grouping will no longer work correctly. I'm stuck! How might one solve this? Where might I have gone wrong?
答案1
得分: 0
以下是您提供的代码的中文翻译:
对于保持空闲并需要在下个月计数的记录,请使用日期维度表或生成一个包含月份的列表,并将其与当前查询连接。下面是MonthList CTE,它是从两个表中的RowEffectiveDate列创建的所需月份起始日期列表。然后,对于每个月,查询检查记录是否在该月内处于活动状态,如果RowEffectiveDate小于或等于该月的起始日期,并且RowExpirationDate大于该月的起始日期或为NULL,则将空闲记录包括在下个月的计数中:
WITH MonthList
AS (
SELECT DISTINCT DATEADD(MONTH, DATEDIFF(MONTH, 0, RowEffectiveDate), 0) AS MonthStart
FROM (
SELECT RowEffectiveDate FROM curated.tblPWDimBuilding
UNION
SELECT RowEffectiveDate FROM curated.tblPWDimUnits
) AS Dates
)
SELECT DATEPART(YEAR, m.MonthStart) AS Year, DATEPART(MONTH, m.MonthStart) AS Month, OrganizationName, COUNT(DISTINCT b.BuildingId) AS [空置计数]
FROM MonthList m
INNER JOIN curated.tblPWDimBuilding b ON b.STATUS = 'Vacant'
AND b.Active = 'True'
AND b.RowEffectiveDate <= m.MonthStart
AND (
b.RowExpirationDate > m.MonthStart
OR b.RowExpirationDate IS NULL
)
GROUP BY DATEPART(YEAR, m.MonthStart), DATEPART(MONTH, m.MonthStart), OrganizationName
UNION
SELECT DATEPART(YEAR, m.MonthStart) AS Year, DATEPART(MONTH, m.MonthStart) AS Month, u.OrganizationName, COUNT(DISTINCT u.UnitID) AS [空置计数]
FROM MonthList m
LEFT JOIN curated.tblPWDimBuilding b ON b.Active = 'True'
AND b.RowEffectiveDate <= m.MonthStart
AND (
b.RowExpirationDate > m.MonthStart
OR b.RowExpirationDate IS NULL
)
INNER JOIN curated.tblPWDimUnits u ON b.BuildingId = u.BuildingId
AND u.STATUS = 'Vacant'
AND u.RowEffectiveDate <= m.MonthStart
AND (
u.RowExpirationDate > m.MonthStart
OR u.RowExpirationDate IS NULL
)
GROUP BY DATEPART(YEAR, m.MonthStart), DATEPART(MONTH, m.MonthStart), u.OrganizationName;
请注意,这是代码的中文翻译,没有其他附加内容。如果您需要进一步的帮助或解释,请告诉我。
英文:
For records that remain idle and need to be counted in the next month use a date dimension table or by generate a list of months in a CTE and then join that with your current query. Below the MonthList CTE is the needed list of month starts created from the RowEffectiveDate column in both tables. Then, for each month, the query checks if a record is active during that month if the RowEffectiveDate is less than or equal to the month start and the RowExpirationDate is greater than the month start, or NULL, which will include idle records in the next month's count:
WITH MonthList
AS (
SELECT DISTINCT DATEADD(MONTH, DATEDIFF(MONTH, 0, RowEffectiveDate), 0) AS MonthStart
FROM (
SELECT RowEffectiveDate FROM curated.tblPWDimBuilding
UNION
SELECT RowEffectiveDate FROM curated.tblPWDimUnits
) AS Dates
)
SELECT DATEPART(YEAR, m.MonthStart) AS Year, DATEPART(MONTH, m.MonthStart) AS Month, OrganizationName, COUNT(DISTINCT b.BuildingId) AS [Vacant Count]
FROM MonthList m
INNER JOIN curated.tblPWDimBuilding b ON b.STATUS = 'Vacant'
AND b.Active = 'True'
AND b.RowEffectiveDate <= m.MonthStart
AND (
b.RowExpirationDate > m.MonthStart
OR b.RowExpirationDate IS NULL
)
GROUP BY DATEPART(YEAR, m.MonthStart), DATEPART(MONTH, m.MonthStart), OrganizationName
UNION /* should this be UNION ALL ?? */
SELECT DATEPART(YEAR, m.MonthStart) AS Year, DATEPART(MONTH, m.MonthStart) AS Month, u.OrganizationName, COUNT(DISTINCT u.UnitID) AS [Vacant Count]
FROM MonthList m
LEFT JOIN curated.tblPWDimBuilding b ON b.Active = 'True'
AND b.RowEffectiveDate <= m.MonthStart
AND (
b.RowExpirationDate > m.MonthStart
OR b.RowExpirationDate IS NULL
)
INNER JOIN curated.tblPWDimUnits u ON b.BuildingId = u.BuildingId
AND u.STATUS = 'Vacant'
AND u.RowEffectiveDate <= m.MonthStart
AND (
u.RowExpirationDate > m.MonthStart
OR u.RowExpirationDate IS NULL
)
GROUP BY DATEPART(YEAR, m.MonthStart), DATEPART(MONTH, m.MonthStart), u.OrganizationName;
note without sample data, or an expected result, it isn't possible to verify that query, but I think the important part of it is to: create & use a date dimension table, or continue generating a list of months - then join your data to that as it enables the "look ahead" logic needed.
I'm also unsure if you should use UNION or UNION ALL - it is possible that UNION is incorrect as it may suppress rows that should be reported - but again without data it is difficult to judge.
答案2
得分: 0
以下是翻译好的部分:
看起来你需要跟踪单户和多户物业的空置情况,考虑到每个月的变化。为了解决下个月仍然空置但数据库中可能没有对应行的问题,你可以修改查询以包括一个子查询,用于识别上个月空置的物业。
以下是如何修改查询以实现这一目标的示例:
WITH PreviousMonthVacant AS (
SELECT
DATEADD(MONTH, -1, RowEffectiveDate) AS PreviousMonth,
CASE
WHEN Status = 'Vacant' THEN BuildingId
WHEN u.Status = 'Vacant' THEN u.BuildingId
ELSE NULL
END AS VacantBuildingId,
CASE
WHEN Status = 'Vacant' THEN NULL
WHEN u.Status = 'Vacant' THEN u.UnitId
ELSE NULL
END AS VacantUnitId
FROM curated.tblPWDimBuilding b
LEFT JOIN curated.tblPWDimUnits u ON b.BuildingId = u.BuildingId
WHERE
b.Active = 'True'
OR u.Active = 'True'
)
SELECT
DATEPART(YEAR, RowEffectiveDate) AS Year,
DATEPART(MONTH, RowEffectiveDate) AS Month,
OrganizationName,
COUNT(DISTINCT b.BuildingId) AS [Vacant Count]
FROM curated.tblPWDimBuilding b
LEFT JOIN curated.tblPWDimUnits u ON b.BuildingId = u.BuildingId
WHERE
b.Active = 'True'
OR u.Active = 'True'
OR b.BuildingId IN (
SELECT VacantBuildingId
FROM PreviousMonthVacant
WHERE VacantBuildingId IS NOT NULL
AND DATEPART(YEAR, PreviousMonth) = DATEPART(YEAR, b.RowEffectiveDate)
AND DATEPART(MONTH, PreviousMonth) = DATEPART(MONTH, b.RowEffectiveDate)
)
GROUP BY
DATEPART(YEAR, RowEffectiveDate),
DATEPART(MONTH, RowEffectiveDate),
OrganizationName
我使用了一个通用表达式 (CTE) 来识别上个月的空置物业。CTE 包括了上个月的日期、空置的建筑物 ID 和空置的单元 ID。
在建筑物和单元表上执行左连接,以包括结果中的空置物业。随后,添加了一个条件,通过检查建筑物 ID 是否存在于子查询结果中来包括上个月的空置物业。
通过包括这个逻辑,你应该能够捕获每个月的空置物业,即使它们在上个月处于空闲状态并且在数据库中没有对应的行。
英文:
It seems that you need to track the vacant properties month over month, considering both single-family and multi-family properties. To address the issue of vacant properties that continue to be vacant in the next month but may not have a corresponding row in the database, you can modify your query to include a subquery that identifies the properties that were vacant in the previous month.
Here's an example of how you can modify your query to achieve this:
WITH PreviousMonthVacant AS (
SELECT
DATEADD(MONTH, -1, RowEffectiveDate) AS PreviousMonth,
CASE
WHEN Status = 'Vacant' THEN BuildingId
WHEN u.Status = 'Vacant' THEN u.BuildingId
ELSE NULL
END AS VacantBuildingId,
CASE
WHEN Status = 'Vacant' THEN NULL
WHEN u.Status = 'Vacant' THEN u.UnitId
ELSE NULL
END AS VacantUnitId
FROM curated.tblPWDimBuilding b
LEFT JOIN curated.tblPWDimUnits u ON b.BuildingId = u.BuildingId
WHERE
b.Active = 'True'
OR u.Active = 'True'
)
SELECT
DATEPART(YEAR, RowEffectiveDate) AS Year,
DATEPART(MONTH, RowEffectiveDate) AS Month,
OrganizationName,
COUNT(DISTINCT b.BuildingId) AS [Vacant Count]
FROM curated.tblPWDimBuilding b
LEFT JOIN curated.tblPWDimUnits u ON b.BuildingId = u.BuildingId
WHERE
b.Active = 'True'
OR u.Active = 'True'
OR b.BuildingId IN (
SELECT VacantBuildingId
FROM PreviousMonthVacant
WHERE VacantBuildingId IS NOT NULL
AND DATEPART(YEAR, PreviousMonth) = DATEPART(YEAR, b.RowEffectiveDate)
AND DATEPART(MONTH, PreviousMonth) = DATEPART(MONTH, b.RowEffectiveDate)
)
GROUP BY
DATEPART(YEAR, RowEffectiveDate),
DATEPART(MONTH, RowEffectiveDate),
OrganizationName
I used a common table expression (CTE) to identify the vacant properties in the previous month. The CTE includes the previous month's date, the vacant building ID, and the vacant unit ID.
A left join is performed on the building and unit tables to include the vacant properties in the result. With that, a condition is added to include the vacant properties from the previous month by checking if the building ID is present in the subquery results.
By including this logic, you should be able to capture the vacant properties month-over-month, even if they were idle in the previous month and do not have a corresponding row in the database.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论