英文:
Display a level three groupby/sum from a SPARQL query as rows and columns and filter on years
问题
我有一些RDF数据,我使用下面的表格进行了简化。
主题 | 开始日期 | 结束日期 | 位置 | 技术 | 值 |
---|---|---|---|---|---|
主题1 | 2005年1月1日 | 2025年1月1日 | 区域A | 技术A | 100 |
主题2 | 2007年1月1日 | 2025年1月1日 | 区域B | 技术A | 50 |
主题3 | 2009年1月1日 | 2025年1月1日 | 区域A | 技术B | 80 |
主题4 | 2011年1月1日 | 2025年1月1日 | 区域B | 技术B | 100 |
主题5 | 2007年1月1日 | 2025年1月1日 | 区域A | 技术A | 20 |
主题6 | 2009年1月1日 | 2025年1月1日 | 区域B | 技术A | 10 |
主题7 | 2011年1月1日 | 2025年1月1日 | 区域A | 技术B | 60 |
主题8 | 2013年1月1日 | 2025年1月1日 | 区域B | 技术B | 70 |
我想编写一个SPARQL查询,执行以下操作:
- 按位置、技术和主题存在的年份进行分组,并求和值
- 主题存在的年份是根据开始日期和结束日期确定的
- 位置和技术必须呈现为行,年份必须呈现为列。
预期输出:
位置 | 技术 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
区域A | 技术A | 100 | 100 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 |
区域A | 技术B | 0 | 0 | 0 | 0 | 80 | 80 | 140 | 140 | 140 | 140 | 140 |
区域B | 技术A | 0 | 0 | 50 | 50 | 60 | 60 | 60 | 60 | 60 | 60 | 60 |
区域B | 技术B | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 100 | 170 | 170 | 170 |
我知道SPARQL不支持直接以类似电子表格的表格格式显示数据,但我尽量避免在Python等后处理中进行必要的后处理。我想将上面的预期输出导出为CSV文件。
我真的不知道如何使用SPARQL做到这一点。我已经完成了一些简单的分组/求和查询,也分别成功地基于开始日期和结束日期筛选了年份,但我真的不知道如何完成这一组合操作。
有人可以帮我解决这个问题吗?
英文:
I have some RDF data which I simplified with the below table.
Subject | StartDate | EndDate | Location | Technology | Value |
---|---|---|---|---|---|
Subject1 | 1-1-2005 | 1-1-2025 | ZoneA | TechnologyA | 100 |
Subject2 | 1-1-2007 | 1-1-2025 | ZoneB | TechnologyA | 50 |
Subject3 | 1-1-2009 | 1-1-2025 | ZoneA | TechnologyB | 80 |
Subject4 | 1-1-2011 | 1-1-2025 | ZoneB | TechnologyB | 100 |
Subject5 | 1-1-2007 | 1-1-2025 | ZoneA | TechnologyA | 20 |
Subject6 | 1-1-2009 | 1-1-2025 | ZoneB | TechnologyA | 10 |
Subject7 | 1-1-2011 | 1-1-2025 | ZoneA | TechnologyB | 60 |
Subject8 | 1-1-2013 | 1-1-2025 | ZoneB | TechnologyB | 70 |
I want to write a SPARQL query that does the following:
- Groupby the Location, Technology and the Year (in which the Subjects exists), and sum the Value
- The Year in which the subject exists is determined from the StartDate and EndDate
- The Location and Technology must be presented as rows and the year as columns.
Expected Output:
Location | Technology | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ZoneA | TechnologyA | 100 | 100 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 |
ZoneA | TechnologyB | 0 | 0 | 0 | 0 | 80 | 80 | 140 | 140 | 140 | 140 | 140 |
ZoneB | TechnologyA | 0 | 0 | 50 | 50 | 60 | 60 | 60 | 60 | 60 | 60 | 60 |
ZoneB | TechnologyB | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 100 | 170 | 170 | 170 |
I know that SPARQL does not support displaying data directly in a tabular format like a spreadsheet, but I am trying to avoid any necessary post-processing in e.g. python as much as possible. I want to export the expected output above as a csv file.
I really have no idea how to do this with SPARQL. I have done some simple groupby/sum queries and separately also managed to filter on years based on a StartDate and EndDate, but I really don't have a clue how to do the combined set of this.
Can someone please help me out with this one?
答案1
得分: 2
这是你可以尝试的内容:
让我们以2005年为例
- 在SELECT部分,你应该写一个SUM子句,像这样
(SUM(?temp2005) AS ?2005)
- 在WHERE部分,你应该有
BIND(IF(你的条件, ?theCapacity, 0) AS ?temp2005)
---> 所以实际上这样做的是将包含容量的变量绑定到另一个变量,以便我们可以区分它们的每一年。然后,它将对与“?temp2005”相关联的所有变量求和,这在我们的情况下是“?2005”。
- 注意!你需要在WHERE部分手动创建一个年份列表,使用VALUES子句,像这样:
VALUES (?Year) {(2005) (2006) ... (2015) }
以下是你的示例的TTL代码:
@prefix ex: <http://example.com/data#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:Subject1 a ex:Subject ;
ex:startDate "2005-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyA" ;
ex:value 100 .
ex:Subject2 a ex:Subject ;
ex:startDate "2007-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyA" ;
ex:value 50 .
ex:Subject3 a ex:Subject ;
ex:startDate "2009-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyB" ;
ex:value 80 .
ex:Subject4 a ex:Subject ;
ex:startDate "2011-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyB" ;
ex:value 100 .
ex:Subject5 a ex:Subject ;
ex:startDate "2007-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyA" ;
ex:value 20 .
ex:Subject6 a ex:Subject ;
ex:startDate "2009-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyA" ;
ex:value 10 .
ex:Subject7 a ex:Subject ;
ex:startDate "2011-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyB" ;
ex:value 60 .
ex:Subject8 a ex:Subject ;
ex:startDate "2013-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyB" ;
ex:value 70 .
英文:
This is what you can try :
Let's take for example year 2005
-
In the SELECT section you should write a SUM clause like this one
(SUM(?temp2005) AS ?2005)
-
In the WHERE section you should have
BIND(IF(your condition,?theCapacity, 0) AS ?temp2005)
---> So what this is actually doing is binding the variable containing the capacity to an another in a way that we can differentiate them for each year. It will then SUM all the variables associated to ``?temp2005 as a total which is in our case ?2005
-
NOTE! That you will need to create manually a list of years with the VALUES clause in your WHERE section `VALUES (?Year) {(2005) (2006) ... (2015) }
PREFIX ex: <http://example.com/data#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?location ?technology (SUM(?temp2005) AS ?2005) (SUM(?temp2006) AS ?2006) (SUM(?temp2007) AS ?2007) (SUM(?temp2008) AS ?2008) (SUM(?temp2009) AS ?2009) (SUM(?temp2010) AS ?2010) (SUM(?temp2011) AS ?2011) (SUM(?temp2012) AS ?2012) (SUM(?temp2013) AS ?2013) (SUM(?temp2014) AS ?2014) (SUM(?temp2015) AS ?2015) WHERE { ?subject a ex:Subject ; ex:startDate ?startDate; ex:endDate ?endDate; ex:location ?location ; ex:technology ?technology ; ex:value ?value . #Create our Year value BIND(xsd:gYear(?Year) AS ?Year) VALUES (?Year) { (2005) (2006) (2007) (2008) (2009) (2010) (2011) (2012) (2013) (2014) (2015) } #EXTRACT the year component from the date and bind them BIND(YEAR(?startDate) AS ?startYear) BIND(YEAR(?endDate) AS ?endYear) BIND(IF((?Year = 2005)&&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2005) . BIND(IF((?Year = 2006) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2006) . BIND(IF((?Year = 2007) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2007) . BIND(IF((?Year = 2008) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2008) . BIND(IF((?Year = 2009) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2009) . BIND(IF((?Year = 2010) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2010) . BIND(IF((?Year = 2011) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2011) . BIND(IF((?Year = 2012) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2012) . BIND(IF((?Year = 2013) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2013) . BIND(IF((?Year = 2014) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2014) . BIND(IF((?Year = 2015) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 0) AS ?temp2015) . } GROUP BY ?location ?technology ORDER BY ?location ?technology
This how the ttl with your example looks like :
@prefix ex: <http://example.com/data#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:Subject1 a ex:Subject ;
ex:startDate "2005-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyA" ;
ex:value 100 .
ex:Subject2 a ex:Subject ;
ex:startDate "2007-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyA" ;
ex:value 50 .
ex:Subject3 a ex:Subject ;
ex:startDate "2009-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyB" ;
ex:value 80 .
ex:Subject4 a ex:Subject ;
ex:startDate "2011-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyB" ;
ex:value 100 .
ex:Subject5 a ex:Subject ;
ex:startDate "2007-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyA" ;
ex:value 20 .
ex:Subject6 a ex:Subject ;
ex:startDate "2009-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyA" ;
ex:value 10 .
ex:Subject7 a ex:Subject ;
ex:startDate "2011-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyB" ;
ex:value 60 .
ex:Subject8 a ex:Subject ;
ex:startDate "2013-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyB" ;
ex:value 70 .
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论