英文:
Group by hourly interval
问题
我是新手使用SQL,在尝试创建一个支持HiveSQL的数据库上生成小时报告时遇到了问题。
以下是我的数据集:
|NAME| CHECKIN_HOUR |CHECKOUT_HOUR|
|----|--------------|-------------|
| A | 00 | 00 |
| B | 00 | 01 |
| C | 00 | 02 |
| D | 00 | null |
| E | 01 | 02 |
| F | 01 | null |
我想要获得一个小时汇总报告,如下所示:
|TIME| CHECKIN_NUMBER |CHECKOUT_NUMBER|STAY_NUMBER|
|----|----------------|---------------|-----------|
| 00 | 4 | 1 | 3 |
| 01 | 2 | 1 | 4 |
| 02 | 0 | 2 | 2 |
"stay_number" 意味着在该小时结束时还没有签出的人数,例如最后一行的 "2" 意味着到了凌晨2点,还有两个人(D和F)尚未签出。因此,我基本上是在尝试为每个小时获取一个汇总的入住、签出和逗留报告。
我不知道如何计算每小时的间隔表,因为简单地按照签入或签出小时分组不会得到预期的结果。所有日期字段最初都是Unix时间戳数据类型,所以可以自由使用日期函数。
非常感谢任何指导和帮助!
英文:
I'm new to SQL and I have problems when trying to make an hourly report on a database that supports HiveSQL.
Here's my dataset
|NAME| CHECKIN_HOUR |CHECKOUT_HOUR|
|----|--------------|-------------|
| A | 00 | 00 |
| B | 00 | 01 |
| C | 00 | 02 |
| D | 00 | null |
| E | 01 | 02 |
| F | 01 | null |
And I would like to get an hourly summary report that looks like this:
|TIME| CHECKIN_NUMBER |CHECKOUT_NUMBER|STAY_NUMBER|
|----|----------------|---------------|-----------|
| 00 | 4 | 1 | 3 |
| 01 | 2 | 1 | 4 |
| 02 | 0 | 2 | 2 |
stay_number
means counting the number of people that haven't checked out by the end of that hour, e.g 2
at the last row means that by the end of 2am, there're two people (D and F) haven't checked out yet. So basically I'm trying to get a summarize check-in, check-out and stay report for each hour.
I've no idea how to compute an hourly interval table since simply grouping by check_in or check_out hour doesn't get the expected result. All the date field is originally in Unix timestamp data type, so feel free to use date functions on it.
Any instructions and help would be greatly appreciated, thanks!
答案1
得分: 2
以下是一种将数据解开并使用累积总和的方法:
select hh,
sum(ins) as checkins, sum(outs) as checkouts,
sum(sum(ins)) over (order by hh) - sum(sum(outs)) over (order by hh)
from ((select checkin_hour as hh, count(*) as ins, 0 as outs
from t
group by checkin_hour
) union all
(select checkout_hour, 0 as ins, count(*) as outs
from t
where checkout_hour is not null
group by checkout_hour
)
) c
group by hh
order by hh;
这个方法的思想是统计每小时的签入和签出次数,然后累积每小时的总数。两者之差即为所需的天数。
英文:
Here is one method that unpivots the data and uses cumulative sums:
select hh,
sum(ins) as checkins, sum(outs) as checkouts,
sum(sum(ins)) over (order by hh) - sum(sum(outs)) over (order by hh)
from ((select checkin_hour as hh, count(*) as ins, 0 as outs
from t
group by checkin_hour
) union all
(select checkout_hour, 0 as ins, count(*) as outs
from t
where checkout_hour is not null
group by checkout_hour
)
) c
group by hh
order by hh;
The idea is to count the number of checks in and check outs in each hour and then accumulate the totals for each hour. The difference is the number of says.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论