BigQuery – How do I check if data is missing from a table for a specific calendar date?

huangapple go评论102阅读模式
英文:

BigQuery - How do I check if data is missing from a table for a specific calendar date?

问题

我有一个包含升序日历日期的表格。我还有另一个包含员工时间表的表格。员工时间表只在创建时间表时插入一行。我想找出的是,在某个日期X,员工是否完成了时间表?

我的最新查询如下。在停止使用几年后重新开始使用SQL,我确定我错过了一些简单的东西:

SELECT 
D.CALENDAR_DATE,
E.EMPLOYEE, 
ROW_NUMBER() OVER (PARTITION BY E.EMPLOYEE, E.CALENDAR_DATE ORDER BY D.CALENDAR_DATE ASC) AS NUMBER_OF_TIMESHEETS
FROM DIM.CALENDAR D
LEFT JOIN EMPLOYEE_TIMESHEETS E ON E.TIMESHEET_DATE = D.CALENDAR_DATE
WHERE
CALENDAR_DATE BETWEEN '2023-04-01' AND '2023-07-19'
ORDER BY CALENDAR_DATE ASC

期望的结果:

Calendar Date Employee Number of Timesheets
4/1/2023 Jerry Jones 1
4/2/2023 Jerry Jones 1
4/3/2023 Jerry Jones 0
4/4/2023 Jerry Jones 0
4/5/2023 Jerry Jones 1
4/1/2023 Sarah Smith 1
4/2/2023 Sarah Smith 0
4/3/2023 Sarah Smith 0
4/4/2023 Sarah Smith 0
4/5/2023 Sarah Smith 1

当前输出:

Calendar Date Employee Number of Timesheets
4/1/2023 Jerry Jones 1
4/3/2023 Jerry Jones 1
4/5/2023 Jerry Jones 1
4/1/2023 Sarah Smith 1
4/5/2023 Sarah Smith 1
英文:

I have a table containing ascending calendar dates. I have another table containing employee time sheets. The employee time sheet table only inserts a row if a time sheet has been created. What I am trying to find is for X date, did the employee complete a time sheet?

My latest query is below. Jumping back into SQL after a couple years off and I'm certain it's something simple that I'm missing:

SELECT 
D. CALENDAR_DATE,
E. EMPLOYEE, 
ROW_NUMBER () OVER (PARTITION BY E. EMPLOYEE, E. CALENDAR_DATE ORDER BY D.CALENDAR_DATE ASC) AS NUMBER OF TIMESHEETS
FROM DIM.CALENDAR D
LEFT JOIN EMPLOYEE_TIMESHEETS E ON E.TIMESHEET_DATE = D.CALENDAR_DATE
WHERE
CALENDAR_DATE BETWEEN '2023-04-01' AND '2023-07-19'
ORDER BY CALENDAR_DATE ASC

Desired Results:

Calendar Date Employee Number of Timesheets
4/1/2023 Jerry Jones 1
4/2/2023 Jerry Jones 1
4/3/2023 Jerry Jones 0
4/4/2023 Jerry Jones 0
4/5/2023 Jerry Jones 1
4/1/2023 Sarah Smith 1
4/2/2023 Sarah Smith 0
4/3/2023 Sarah Smith 0
4/4/2023 Sarah Smith 0
4/5/2023 Sarah Smith 1

Current Output:

Calendar Date Employee Number of Timesheets
4/1/2023 Jerry Jones 1
4/3/2023 Jerry Jones 1
4/5/2023 Jerry Jones 1
4/1/2023 Sarah Smith 1
4/5/2023 Sarah Smith 1

答案1

得分: 1

你可以在当前布局中使用笛卡尔连接条件聚合来实现。

create table calendar (
  calendar_date date
);

insert into calendar values 
('2023-04-01'), 
('2023-04-02'),
('2023-04-03'),
('2023-04-04'),
('2023-10-01');

create table employee_timesheets (
  employee varchar(10), 
  timesheet_date date
);

insert into employee_timesheets values 
('Jerry', '2023-04-01'), 
('Jerry', '2023-04-02'), 
('Tom', '2023-04-03'), 
('Sarah', '2023-04-01'), 
('Sarah', '2023-04-02'),
('Sarah', '2023-04-04'), 
('Sarah', '2023-10-01');
select d.calendar_date, e.employee,
  max(case when d.calendar_date = e.timesheet_date then 1 else 0 end) nbr_of_timesheets
from calendar d, employee_timesheets e
where d.calendar_date between '2023-04-01' and '2023-07-19'
group by 1,2
order by 2,1;

结果如下:

calendar_date employee nbr_of_timesheets
2023-04-01T00:00:00.000Z Jerry 1
2023-04-02T00:00:00.000Z Jerry 1
2023-04-03T00:00:00.000Z Jerry 0
2023-04-04T00:00:00.000Z Jerry 0
2023-04-01T00:00:00.000Z Sarah 1
2023-04-02T00:00:00.000Z Sarah 1
2023-04-03T00:00:00.000Z Sarah 0
2023-04-04T00:00:00.000Z Sarah 1
2023-04-01T00:00:00.000Z Tom 0
2023-04-02T00:00:00.000Z Tom 0
2023-04-03T00:00:00.000Z Tom 1
2023-04-04T00:00:00.000Z Tom 0

你可以在DB Fiddle上查看。

英文:

You could use a cartesian join and conditional aggregation with your current layout.

create table calendar (
  calendar_date date
  );
  
insert into calendar values 
('2023-04-01'), 
('2023-04-02'),
('2023-04-03'),
('2023-04-04'),
('2023-10-01');

create table employee_timesheets (
  employee varchar(10), 
  timesheet_date date
  );
  
insert into employee_timesheets values 
('Jerry', '2023-04-01'), 
('Jerry', '2023-04-02'), 
('Tom', '2023-04-03'), 
('Sarah', '2023-04-01'), 
('Sarah', '2023-04-02'),
('Sarah', '2023-04-04'), 
('Sarah', '2023-10-01');

select d.calendar_date, e.employee,
  max(case when d.calendar_date = e.timesheet_date then 1 else 0 end) nbr_of_timesheets
from calendar d, employee_timesheets e
where d.calendar_date between '2023-04-01' and '2023-07-19'
group by 1,2
order by 2,1;
calendar_date employee nbr_of_timesheets
2023-04-01T00:00:00.000Z Jerry 1
2023-04-02T00:00:00.000Z Jerry 1
2023-04-03T00:00:00.000Z Jerry 0
2023-04-04T00:00:00.000Z Jerry 0
2023-04-01T00:00:00.000Z Sarah 1
2023-04-02T00:00:00.000Z Sarah 1
2023-04-03T00:00:00.000Z Sarah 0
2023-04-04T00:00:00.000Z Sarah 1
2023-04-01T00:00:00.000Z Tom 0
2023-04-02T00:00:00.000Z Tom 0
2023-04-03T00:00:00.000Z Tom 1
2023-04-04T00:00:00.000Z Tom 0

View on DB Fiddle

答案2

得分: 0

使用您当前的设计,您应该有3个表。一个用于员工,一个用于日历日期,一个用于时间表日期。时间表表应该只引用员工ID,而不存储员工姓名。

下面的示例查询将显示在提供的范围内的所有日期,显示所有员工及其可能的工时。如果没有工时记录,员工仍将显示。如果某个指定日期没有员工工作,该日期仍将显示。

SELECT c.calendar_date, e.name, ifnull(t.hours,0) hours 
  FROM calendar c 
    left join employee e on true 
	left join timesheet t on c.calendar_date = t.calendar_date and e.employee_id = t.employee_id 
WHERE c.calendar_date between '2023-08-01' and '2023-08-04' 
order by c.calendar_date, e.name

我还要提到的是,拥有一个存储每个可能的工作日期的日历表不是最佳设计。最好的解决方案是使用一些编程语言,如PHP,创建一个包含整个星期所有日期的数据结构(数组或多维数组)。然后,您将查询数据库,将结果合并到数据结构中,然后输出数据结构。使用这样的设计,您将不需要包含一个日历表。但是,如果您的目标是编写标准的SQL,上述方法应该是最佳解决方案。

顺便说一下,我的表设计如下。

'calendar'
字段:
 calendar_date,日期(主键)

'employee'
字段:
 employee_id,小整数(主键)
 namevarchar(30)
 hire_date,日期

'timesheet'
字段:
 timesheet_id,小整数(主键)
 calendar_date,日期
 employee_id,小整数
 hours,小数
英文:

Using your current design, you really should have 3 tables. One for employees, one for calendar dates, and one for the timesheet date. The timesheet table should simply reference an employee ID, and not store the employee name.

The sample query below will display all dates in the range provided, showing ALL employees and any hours they may have had. If there were no hours worked, the employee will still show. If nobody worked on a specified date, the date will still show.

SELECT c.calendar_date, e.name, ifnull(t.hours,0) hours 
  FROM calendar c 
    left join employee e on true 
	left join timesheet t on c.calendar_date = t.calendar_date and e.employee_id = t.employee_id 
WHERE c.calendar_date between '2023-08-01' and '2023-08-04' 
order by c.calendar_date, e.name

I'll also mention that having a calendar table that stores every possible working date is not the best design. The best solution would be to use some programming language such as PHP to create a data structure (array or multi-dimensional array) that contains all dates for the week. Then, you would query the database, merge the results into the data structure, and then output the data structure. Using a design like that you would not need to include a calendar table. But, if your goal is to write standard SQL, the above should be the best solution.

By the way, my table design is as follows.

'calendar' table
fields:
 calendar_date , date (primary key)

'employee' table
fields:
 employee_id , smallint (primary key)
 name , varchar(30) 
 hire_date , date 

'timesheet' table
fields: 
 timesheet_id , smallint (primary key)
 calendar_date , date 
 employee_id , smallint 
 hours , decimal

huangapple
  • 本文由 发表于 2023年8月8日 23:56:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76861268.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定