英文:
BigQuery - How do I check if data is missing from a table for a specific calendar date?
问题
我有一个包含升序日历日期的表格。我还有另一个包含员工时间表的表格。员工时间表只在创建时间表时插入一行。我想找出的是,在某个日期X,员工是否完成了时间表?
我的最新查询如下。在停止使用几年后重新开始使用SQL,我确定我错过了一些简单的东西:
SELECT
D.CALENDAR_DATE,
E.EMPLOYEE,
ROW_NUMBER() OVER (PARTITION BY E.EMPLOYEE, E.CALENDAR_DATE ORDER BY D.CALENDAR_DATE ASC) AS NUMBER_OF_TIMESHEETS
FROM DIM.CALENDAR D
LEFT JOIN EMPLOYEE_TIMESHEETS E ON E.TIMESHEET_DATE = D.CALENDAR_DATE
WHERE
CALENDAR_DATE BETWEEN '2023-04-01' AND '2023-07-19'
ORDER BY CALENDAR_DATE ASC
期望的结果:
Calendar Date | Employee | Number of Timesheets |
---|---|---|
4/1/2023 | Jerry Jones | 1 |
4/2/2023 | Jerry Jones | 1 |
4/3/2023 | Jerry Jones | 0 |
4/4/2023 | Jerry Jones | 0 |
4/5/2023 | Jerry Jones | 1 |
4/1/2023 | Sarah Smith | 1 |
4/2/2023 | Sarah Smith | 0 |
4/3/2023 | Sarah Smith | 0 |
4/4/2023 | Sarah Smith | 0 |
4/5/2023 | Sarah Smith | 1 |
当前输出:
Calendar Date | Employee | Number of Timesheets |
---|---|---|
4/1/2023 | Jerry Jones | 1 |
4/3/2023 | Jerry Jones | 1 |
4/5/2023 | Jerry Jones | 1 |
4/1/2023 | Sarah Smith | 1 |
4/5/2023 | Sarah Smith | 1 |
英文:
I have a table containing ascending calendar dates. I have another table containing employee time sheets. The employee time sheet table only inserts a row if a time sheet has been created. What I am trying to find is for X date, did the employee complete a time sheet?
My latest query is below. Jumping back into SQL after a couple years off and I'm certain it's something simple that I'm missing:
SELECT
D. CALENDAR_DATE,
E. EMPLOYEE,
ROW_NUMBER () OVER (PARTITION BY E. EMPLOYEE, E. CALENDAR_DATE ORDER BY D.CALENDAR_DATE ASC) AS NUMBER OF TIMESHEETS
FROM DIM.CALENDAR D
LEFT JOIN EMPLOYEE_TIMESHEETS E ON E.TIMESHEET_DATE = D.CALENDAR_DATE
WHERE
CALENDAR_DATE BETWEEN '2023-04-01' AND '2023-07-19'
ORDER BY CALENDAR_DATE ASC
Desired Results:
Calendar Date | Employee | Number of Timesheets |
---|---|---|
4/1/2023 | Jerry Jones | 1 |
4/2/2023 | Jerry Jones | 1 |
4/3/2023 | Jerry Jones | 0 |
4/4/2023 | Jerry Jones | 0 |
4/5/2023 | Jerry Jones | 1 |
4/1/2023 | Sarah Smith | 1 |
4/2/2023 | Sarah Smith | 0 |
4/3/2023 | Sarah Smith | 0 |
4/4/2023 | Sarah Smith | 0 |
4/5/2023 | Sarah Smith | 1 |
Current Output:
Calendar Date | Employee | Number of Timesheets |
---|---|---|
4/1/2023 | Jerry Jones | 1 |
4/3/2023 | Jerry Jones | 1 |
4/5/2023 | Jerry Jones | 1 |
4/1/2023 | Sarah Smith | 1 |
4/5/2023 | Sarah Smith | 1 |
答案1
得分: 1
你可以在当前布局中使用笛卡尔连接
和条件聚合
来实现。
create table calendar (
calendar_date date
);
insert into calendar values
('2023-04-01'),
('2023-04-02'),
('2023-04-03'),
('2023-04-04'),
('2023-10-01');
create table employee_timesheets (
employee varchar(10),
timesheet_date date
);
insert into employee_timesheets values
('Jerry', '2023-04-01'),
('Jerry', '2023-04-02'),
('Tom', '2023-04-03'),
('Sarah', '2023-04-01'),
('Sarah', '2023-04-02'),
('Sarah', '2023-04-04'),
('Sarah', '2023-10-01');
select d.calendar_date, e.employee,
max(case when d.calendar_date = e.timesheet_date then 1 else 0 end) nbr_of_timesheets
from calendar d, employee_timesheets e
where d.calendar_date between '2023-04-01' and '2023-07-19'
group by 1,2
order by 2,1;
结果如下:
calendar_date | employee | nbr_of_timesheets |
---|---|---|
2023-04-01T00:00:00.000Z | Jerry | 1 |
2023-04-02T00:00:00.000Z | Jerry | 1 |
2023-04-03T00:00:00.000Z | Jerry | 0 |
2023-04-04T00:00:00.000Z | Jerry | 0 |
2023-04-01T00:00:00.000Z | Sarah | 1 |
2023-04-02T00:00:00.000Z | Sarah | 1 |
2023-04-03T00:00:00.000Z | Sarah | 0 |
2023-04-04T00:00:00.000Z | Sarah | 1 |
2023-04-01T00:00:00.000Z | Tom | 0 |
2023-04-02T00:00:00.000Z | Tom | 0 |
2023-04-03T00:00:00.000Z | Tom | 1 |
2023-04-04T00:00:00.000Z | Tom | 0 |
你可以在DB Fiddle上查看。
英文:
You could use a cartesian join
and conditional aggregation
with your current layout.
create table calendar (
calendar_date date
);
insert into calendar values
('2023-04-01'),
('2023-04-02'),
('2023-04-03'),
('2023-04-04'),
('2023-10-01');
create table employee_timesheets (
employee varchar(10),
timesheet_date date
);
insert into employee_timesheets values
('Jerry', '2023-04-01'),
('Jerry', '2023-04-02'),
('Tom', '2023-04-03'),
('Sarah', '2023-04-01'),
('Sarah', '2023-04-02'),
('Sarah', '2023-04-04'),
('Sarah', '2023-10-01');
select d.calendar_date, e.employee,
max(case when d.calendar_date = e.timesheet_date then 1 else 0 end) nbr_of_timesheets
from calendar d, employee_timesheets e
where d.calendar_date between '2023-04-01' and '2023-07-19'
group by 1,2
order by 2,1;
calendar_date | employee | nbr_of_timesheets |
---|---|---|
2023-04-01T00:00:00.000Z | Jerry | 1 |
2023-04-02T00:00:00.000Z | Jerry | 1 |
2023-04-03T00:00:00.000Z | Jerry | 0 |
2023-04-04T00:00:00.000Z | Jerry | 0 |
2023-04-01T00:00:00.000Z | Sarah | 1 |
2023-04-02T00:00:00.000Z | Sarah | 1 |
2023-04-03T00:00:00.000Z | Sarah | 0 |
2023-04-04T00:00:00.000Z | Sarah | 1 |
2023-04-01T00:00:00.000Z | Tom | 0 |
2023-04-02T00:00:00.000Z | Tom | 0 |
2023-04-03T00:00:00.000Z | Tom | 1 |
2023-04-04T00:00:00.000Z | Tom | 0 |
答案2
得分: 0
使用您当前的设计,您应该有3个表。一个用于员工,一个用于日历日期,一个用于时间表日期。时间表表应该只引用员工ID,而不存储员工姓名。
下面的示例查询将显示在提供的范围内的所有日期,显示所有员工及其可能的工时。如果没有工时记录,员工仍将显示。如果某个指定日期没有员工工作,该日期仍将显示。
SELECT c.calendar_date, e.name, ifnull(t.hours,0) hours
FROM calendar c
left join employee e on true
left join timesheet t on c.calendar_date = t.calendar_date and e.employee_id = t.employee_id
WHERE c.calendar_date between '2023-08-01' and '2023-08-04'
order by c.calendar_date, e.name
我还要提到的是,拥有一个存储每个可能的工作日期的日历表不是最佳设计。最好的解决方案是使用一些编程语言,如PHP,创建一个包含整个星期所有日期的数据结构(数组或多维数组)。然后,您将查询数据库,将结果合并到数据结构中,然后输出数据结构。使用这样的设计,您将不需要包含一个日历表。但是,如果您的目标是编写标准的SQL,上述方法应该是最佳解决方案。
顺便说一下,我的表设计如下。
'calendar'表
字段:
calendar_date,日期(主键)
'employee'表
字段:
employee_id,小整数(主键)
name,varchar(30)
hire_date,日期
'timesheet'表
字段:
timesheet_id,小整数(主键)
calendar_date,日期
employee_id,小整数
hours,小数
英文:
Using your current design, you really should have 3 tables. One for employees, one for calendar dates, and one for the timesheet date. The timesheet table should simply reference an employee ID, and not store the employee name.
The sample query below will display all dates in the range provided, showing ALL employees and any hours they may have had. If there were no hours worked, the employee will still show. If nobody worked on a specified date, the date will still show.
SELECT c.calendar_date, e.name, ifnull(t.hours,0) hours
FROM calendar c
left join employee e on true
left join timesheet t on c.calendar_date = t.calendar_date and e.employee_id = t.employee_id
WHERE c.calendar_date between '2023-08-01' and '2023-08-04'
order by c.calendar_date, e.name
I'll also mention that having a calendar table that stores every possible working date is not the best design. The best solution would be to use some programming language such as PHP to create a data structure (array or multi-dimensional array) that contains all dates for the week. Then, you would query the database, merge the results into the data structure, and then output the data structure. Using a design like that you would not need to include a calendar table. But, if your goal is to write standard SQL, the above should be the best solution.
By the way, my table design is as follows.
'calendar' table
fields:
calendar_date , date (primary key)
'employee' table
fields:
employee_id , smallint (primary key)
name , varchar(30)
hire_date , date
'timesheet' table
fields:
timesheet_id , smallint (primary key)
calendar_date , date
employee_id , smallint
hours , decimal
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论