SQL按年份统计数量

huangapple go评论58阅读模式
英文:

SQL Count by Year

问题

这是有错误结果的部分:

SELECT 		
	COUNT (DISTINCT a.I_CUSTOMER_M) AS 'Visited Customer',
	COUNT (DISTINCT c.S_CUSTNO) AS 'Buying Customer',
	year ( i.D_INVOICEDATE) AS 'Year'

FROM
	 sao.customer_M c
	LEFT JOIN sao.invoice_p i 
		on i.I_CUSTOMER_M = c.I_CUSTOMER_M
	LEFT  JOIN sao.ACTIVITY_P a
		on 	(i.I_customer_M = a.I_CUSTOMER_M and a.I_ACTIVTYPE_C = 11 
	OR i.I_CUSTOMER_M IS NULL )

WHERE 
	i.i_invoice_p>0 AND 
	i.B_PROFORMA = 0 and
	i.B_CREDITNOTE = 0 and
	i.B_CANCEL = 0 and
	i.dt_deleted IS NULL AND
	datepart(yyyy,getdate()) - datepart(yyyy,i.D_INVOICEDATE) <= 2 

GROUP BY
	year (i.D_INVOICEDATE)

这是只查询单一年份的示例:

SELECT	
	COUNT (DISTINCT a.I_CUSTOMER_M) AS 'Visited Customer',
	COUNT (DISTINCT c.S_CUSTNO) AS 'Buying Customer',
	year ( i.D_INVOICEDATE) AS 'Year'
FROM
	sao.customer_M c 
LEFT JOIN sao.invoice_p i
	ON i.I_CUSTOMER_M = c.I_CUSTOMER_M
LEFT  JOIN sao.ACTIVITY_P a
	ON 	(i.I_customer_M = a.I_CUSTOMER_M and a.I_ACTIVTYPE_C = 11
	and datepart(yyyy,getdate()) - datepart(yyyy,a.D_ACTIVITYDATE) = 1  OR i.I_CUSTOMER_M IS NULL )
WHERE 
	i.i_invoice_p>0 AND 
	i.B_PROFORMA = 0 and
	i.B_CREDITNOTE = 0 and
	i.B_CANCEL = 0 and
	i.dt_deleted IS NULL AND
	datepart(yyyy,getdate()) - datepart(yyyy,i.D_INVOICEDATE) = 1 
	GROUP BY YEAR(i.D_INVOICEDATE)

所以你可以看到"Visited Customer"的计数有很大的差异。其他都没问题,不确定我做错了什么。有人可以帮我吗?

英文:

I'm trying to write a query to compare the number of customers with visits and the number of customers who buy per year. If I do the query for the individual year without Group By, the number is correct, but I need it as a correct table in the system to be able to continue working with it.

This is the one with the incorrect results.

SELECT 		
	COUNT (DISTINCT a.I_CUSTOMER_M) AS &#39;Visited Customer&#39;,
	COUNT (DISTINCT c.S_CUSTNO) AS &#39;Buying Customer&#39;,
	year ( i.D_INVOICEDATE) AS &#39;Year&#39;

FROM
	 sao.customer_M c
	LEFT JOIN sao.invoice_p i 
		on i.I_CUSTOMER_M = c.I_CUSTOMER_M
	LEFT  JOIN sao.ACTIVITY_P a
		on 	(i.I_customer_M = a.I_CUSTOMER_M and a.I_ACTIVTYPE_C = 11 
	OR i.I_CUSTOMER_M IS NULL )

WHERE 
	i.i_invoice_p&gt;0 AND 
	i.B_PROFORMA = 0 and
	i.B_CREDITNOTE = 0 and
	i.B_CANCEL = 0 and
	i.dt_deleted IS NULL AND
	datepart(yyyy,getdate()) - datepart(yyyy,i.D_INVOICEDATE) &lt;= 2 

GROUP BY
	year (i.D_INVOICEDATE)
Visited Customer Buying Customer Year
507 1490 2021
509 1452 2022
438 1143 2023

And here is an example if I just make the query for one single year.

SELECT	
				COUNT (DISTINCT a.I_CUSTOMER_M) AS &#39;Visited Customer&#39;,
				COUNT (DISTINCT c.S_CUSTNO) AS &#39;Buying Customer&#39;,
				year ( i.D_INVOICEDATE) AS &#39;Year&#39;
			FROM
				 sao.customer_M c 
			LEFT JOIN sao.invoice_p i
				on i.I_CUSTOMER_M = c.I_CUSTOMER_M
			LEFT  JOIN sao.ACTIVITY_P a
				on 	(i.I_customer_M = a.I_CUSTOMER_M and a.I_ACTIVTYPE_C = 11
				and datepart(yyyy,getdate()) - datepart(yyyy,a.D_ACTIVITYDATE) = 1  OR i.I_CUSTOMER_M IS NULL )
			WHERE 
				i.i_invoice_p&gt;0 AND 
				i.B_PROFORMA = 0 and
				i.B_CREDITNOTE = 0 and
				i.B_CANCEL = 0 and
				i.dt_deleted IS NULL AND
				datepart(yyyy,getdate()) - datepart(yyyy,i.D_INVOICEDATE) = 1 
				GROUP BY YEAR(i.D_INVOICEDATE)
Visited Customer Buying Customer Year
235 1452 2022

So as you can see the count of the Visited Customer has a big difference. Everything else is fine, not sure what I#m doing wrong. Can someone please help me?

答案1

得分: 0

在询问这种问题时,提供示例的DDL/DML非常有帮助,因为它可以帮助那些想要回答你问题的人了解你的数据具体是什么样子的。

以下是一些相当基本的DDL/DML示例,希望足以显示你可能卡住的地方:

DECLARE @Activities TABLE (ActivityID BIGINT IDENTITY, CustomerID BIGINT, ActivityDate DATETIME);
DECLARE @Invoices   TABLE (InvoiceID BIGINT IDENTITY, CustomerID BIGINT, InvoiceDate DATETIME);
DECLARE @Customers  TABLE (CustomerID BIGINT IDENTITY, FirstName NVARCHAR(30), LastName NVARCHAR(30));
INSERT INTO @Customers (FirstName, LastName) VALUES
('Margaret', 'Ramirez'),('Brenda', 'Bailey'),('Helen', 'Richardson'),('Timothy', 'Carter'),('Jessica', 'Thomas'),('Jason', 'Diaz'),('Ryan', 'Collins'),('Christopher', 'Lee'),('Jerry', 'Foster'),('Emily', 'Scott'),('Barbara', 'Lopez'),
('Jennifer', 'Garcia'),('Jacob', 'Stewart'),('David', 'Rodriguez'),('Ronald', 'Gomez'),('Anna', 'Cooper'),('Catherine', 'Myers'),('Daniel', 'Thompson'),('Christine', 'Bennet'),('Cynthia', 'Reyes'),
('Brian', 'Hall'),('Kathleen', 'Morris'),('Anna', 'Cooper'),('Benjamin', 'Watson'),('Benjamin', 'Watson'),('Samuel', 'Chavez'),('Donald', 'Walker'),('Angela', 'Rogers'),('Dennis', 'Long'),('Elizabeth', 'Martinez');

INSERT INTO @Activities (CustomerID, ActivityDate) 
SELECT TOP 50 ROUND((30 * Rnd1 + 1), 0), DATEADD(DAY,-ROUND((1095 * Rnd2 + 0), 0),GETDATE())
  FROM (VALUES (RAND(CONVERT(VARBINARY,NEWID(),1)),RAND(CONVERT(VARBINARY,NEWID(),1)))) a (Rnd1, Rnd2)
    CROSS APPLY sys.sysobjects b
	CROSS APPLY sys.sysobjects c
INSERT INTO @Invoices (CustomerID, InvoiceDate) 
SELECT TOP 50 ROUND((30 * Rnd1 + 1), 0), DATEADD(DAY,-ROUND((1095 * Rnd2 + 0), 0),GETDATE())
  FROM (VALUES (RAND(CONVERT(VARBINARY,NEWID(),1)),RAND(CONVERT(VARBINARY,NEWID(),1)))) a (Rnd1, Rnd2)
    CROSS APPLY sys.sysobjects b
	CROSS APPLY sys.sysobjects c;

我们在这里只是创建了一些表变量,并用一些随机数据填充它们。顾客只是随机匹配的名字和姓氏,而发票/活动是在今天和三年前之间的某个时间随机生成的日期。

我认为你得到不正确的值是因为数据之间的关系只有在有共同性的地方才有效,也许是顾客访问并购买的地方?如果你首先将这些聚合数据分开,然后再将它们组合起来,你应该能找到你要找的数字:

;WITH InvoiceCounts AS (
SELECT DATEPART(YEAR,InvoiceDate) AS Year, COUNT(DISTINCT c.CustomerID) AS Count
  FROM @Customers c
    INNER JOIN @Invoices i
	  ON c.CustomerID = i.CustomerID
 WHERE i.InvoiceID > 0
 GROUP BY DATEPART(YEAR,InvoiceDate)
), ActivityCounts AS (
SELECT DATEPART(YEAR,ActivityDate) AS Year, COUNT(DISTINCT c.CustomerID) AS Count
  FROM @Customers c
	INNER JOIN @Activities a
	  ON c.CustomerID = a.CustomerID
 WHERE a.CustomerID > -1
 GROUP BY DATEPART(YEAR,ActivityDate)
), Years AS (
SELECT Year FROM InvoiceCounts
UNION
SELECT Year FROM ActivityCounts
)

SELECT a.Count AS VisitingCustomers, i.Count AS BuyingCustomers, y.Year AS Year
  FROM Years y
    LEFT OUTER JOIN InvoiceCounts i
	  ON y.Year = i.Year
	LEFT OUTER JOIN ActivityCounts a
	  ON y.Year = a.Year

首先我们从发票中聚合数据,然后从访问中也是一样。第三个CTE只是给我们提供了有访问或活动的年份列表。最后,我们使用这些CTE并将它们与年份连接在一起。

你需要做一些工作,使这适用于你的数据集。你应该能够使用你的查询替换合适的集合CTE。

英文:

When asking questions like this it's very helpful to provide example DDL/DML. It helps folks who want to answer your question by detailing exactly what your data looks like.

Here's some pretty basic example DDL/DML which is hopefully enough to show where I think you've gotten hung up.

DECLARE @Activities TABLE (ActivityID BIGINT IDENTITY, CustomerID BIGINT, ActivityDate DATETIME);
DECLARE @Invoices   TABLE (InvoiceID BIGINT IDENTITY, CustomerID BIGINT, InvoiceDate DATETIME);
DECLARE @Customers  TABLE (CustomerID BIGINT IDENTITY, FirstName NVARCHAR(30), LastName NVARCHAR(30));
INSERT INTO @Customers (FirstName, LastName) VALUES
(&#39;Margaret&#39;, &#39;Ramirez&#39;),(&#39;Brenda&#39;, &#39;Bailey&#39;),(&#39;Helen&#39;, &#39;Richardson&#39;),(&#39;Timothy&#39;, &#39;Carter&#39;),(&#39;Jessica&#39;, &#39;Thomas&#39;),(&#39;Jason&#39;, &#39;Diaz&#39;),(&#39;Ryan&#39;, &#39;Collins&#39;),(&#39;Christopher&#39;, &#39;Lee&#39;),(&#39;Jerry&#39;, &#39;Foster&#39;),(&#39;Emily&#39;, &#39;Scott&#39;),(&#39;Barbara&#39;, &#39;Lopez&#39;),
(&#39;Jennifer&#39;, &#39;Garcia&#39;),(&#39;Jacob&#39;, &#39;Stewart&#39;),(&#39;David&#39;, &#39;Rodriguez&#39;),(&#39;Ronald&#39;, &#39;Gomez&#39;),(&#39;Anna&#39;, &#39;Cooper&#39;),(&#39;Catherine&#39;, &#39;Myers&#39;),(&#39;Daniel&#39;, &#39;Thompson&#39;),(&#39;Christine&#39;, &#39;Bennet&#39;),(&#39;Cynthia&#39;, &#39;Reyes&#39;),
(&#39;Brian&#39;, &#39;Hall&#39;),(&#39;Kathleen&#39;, &#39;Morris&#39;),(&#39;Anna&#39;, &#39;Cooper&#39;),(&#39;Benjamin&#39;, &#39;Watson&#39;),(&#39;Benjamin&#39;, &#39;Watson&#39;),(&#39;Samuel&#39;, &#39;Chavez&#39;),(&#39;Donald&#39;, &#39;Walker&#39;),(&#39;Angela&#39;, &#39;Rogers&#39;),(&#39;Dennis&#39;, &#39;Long&#39;),(&#39;Elizabeth&#39;, &#39;Martinez&#39;);

INSERT INTO @Activities (CustomerID, ActivityDate) 
SELECT TOP 50 ROUND((30 * Rnd1 + 1), 0), DATEADD(DAY,-ROUND((1095 * Rnd2 + 0), 0),GETDATE())
  FROM (VALUES (RAND(CONVERT(VARBINARY,NEWID(),1)),RAND(CONVERT(VARBINARY,NEWID(),1)))) a (Rnd1, Rnd2)
    CROSS APPLY sys.sysobjects b
	CROSS APPLY sys.sysobjects c
INSERT INTO @Invoices (CustomerID, InvoiceDate) 
SELECT TOP 50 ROUND((30 * Rnd1 + 1), 0), DATEADD(DAY,-ROUND((1095 * Rnd2 + 0), 0),GETDATE())
  FROM (VALUES (RAND(CONVERT(VARBINARY,NEWID(),1)),RAND(CONVERT(VARBINARY,NEWID(),1)))) a (Rnd1, Rnd2)
    CROSS APPLY sys.sysobjects b
	CROSS APPLY sys.sysobjects c;

All we're doing here is creating a few table variables, and populating them with some random data. The customers are just random pairings of first and last names, and the invoices/activities are randomly generated dates at some point between today and three years ago.

I think you're getting incorrect values because of the relationship between the data only being valid where there's commonality, perhaps where a customer visited and purchased? If you segregate those aggregates first, and then bring them back together you should find the numbers you're looking for:

;WITH InvoiceCounts AS (
SELECT DATEPART(YEAR,InvoiceDate) AS Year, COUNT(DISTINCT c.CustomerID) AS Count
  FROM @Customers c
    INNER JOIN @Invoices i
	  ON c.CustomerID = i.CustomerID
 WHERE i.InvoiceID &gt; 0
 GROUP BY DATEPART(YEAR,InvoiceDate)
), ActivityCounts AS (
SELECT DATEPART(YEAR,ActivityDate) AS Year, COUNT(DISTINCT c.CustomerID) AS Count
  FROM @Customers c
	INNER JOIN @Activities a
	  ON c.CustomerID = a.CustomerID
 WHERE a.CustomerID &gt; -1
 GROUP BY DATEPART(YEAR,ActivityDate)
), Years AS (
SELECT Year FROM InvoiceCounts
UNION
SELECT Year FROM ActivityCounts
)

SELECT a.Count AS VisitingCustomers, i.Count AS BuyingCustomers, y.Year AS Year
  FROM Years y
    LEFT OUTER JOIN InvoiceCounts i
	  ON y.Year = i.Year
	LEFT OUTER JOIN ActivityCounts a
	  ON y.Year = a.Year

First we aggregate up the data from Invoices, and then the same for visits. The third CTE just gives us a list of years with visits or activity. Finally, we use the CTEs and join them on the year.

You do have some work to do here, to make this applicable to your data set. You should be able to take your queries and replace the collection CTEs with appropriate queries.

no data table as the results are random and won't match anyway SQL按年份统计数量
the WHERE for the collection CTEs serves no purpose but as an example.

huangapple
  • 本文由 发表于 2023年7月17日 21:59:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76705202.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定