Display a level three groupby/sum from a SPARQL query as rows and columns and filter on years

huangapple go评论72阅读模式
英文:

Display a level three groupby/sum from a SPARQL query as rows and columns and filter on years

问题

我有一些RDF数据,我使用下面的表格进行了简化。

主题 开始日期 结束日期 位置 技术
主题1 2005年1月1日 2025年1月1日 区域A 技术A 100
主题2 2007年1月1日 2025年1月1日 区域B 技术A 50
主题3 2009年1月1日 2025年1月1日 区域A 技术B 80
主题4 2011年1月1日 2025年1月1日 区域B 技术B 100
主题5 2007年1月1日 2025年1月1日 区域A 技术A 20
主题6 2009年1月1日 2025年1月1日 区域B 技术A 10
主题7 2011年1月1日 2025年1月1日 区域A 技术B 60
主题8 2013年1月1日 2025年1月1日 区域B 技术B 70

我想编写一个SPARQL查询,执行以下操作:

  • 按位置、技术和主题存在的年份进行分组,并求和值
  • 主题存在的年份是根据开始日期和结束日期确定的
  • 位置和技术必须呈现为行,年份必须呈现为列。

预期输出:

位置 技术 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
区域A 技术A 100 100 120 120 120 120 120 120 120 120 120
区域A 技术B 0 0 0 0 80 80 140 140 140 140 140
区域B 技术A 0 0 50 50 60 60 60 60 60 60 60
区域B 技术B 0 0 0 0 0 0 100 100 170 170 170

我知道SPARQL不支持直接以类似电子表格的表格格式显示数据,但我尽量避免在Python等后处理中进行必要的后处理。我想将上面的预期输出导出为CSV文件。

我真的不知道如何使用SPARQL做到这一点。我已经完成了一些简单的分组/求和查询,也分别成功地基于开始日期和结束日期筛选了年份,但我真的不知道如何完成这一组合操作。

有人可以帮我解决这个问题吗?

英文:

I have some RDF data which I simplified with the below table.

Subject StartDate EndDate Location Technology Value
Subject1 1-1-2005 1-1-2025 ZoneA TechnologyA 100
Subject2 1-1-2007 1-1-2025 ZoneB TechnologyA 50
Subject3 1-1-2009 1-1-2025 ZoneA TechnologyB 80
Subject4 1-1-2011 1-1-2025 ZoneB TechnologyB 100
Subject5 1-1-2007 1-1-2025 ZoneA TechnologyA 20
Subject6 1-1-2009 1-1-2025 ZoneB TechnologyA 10
Subject7 1-1-2011 1-1-2025 ZoneA TechnologyB 60
Subject8 1-1-2013 1-1-2025 ZoneB TechnologyB 70

I want to write a SPARQL query that does the following:

  • Groupby the Location, Technology and the Year (in which the Subjects exists), and sum the Value
  • The Year in which the subject exists is determined from the StartDate and EndDate
  • The Location and Technology must be presented as rows and the year as columns.

Expected Output:

Location Technology 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
ZoneA TechnologyA 100 100 120 120 120 120 120 120 120 120 120
ZoneA TechnologyB 0 0 0 0 80 80 140 140 140 140 140
ZoneB TechnologyA 0 0 50 50 60 60 60 60 60 60 60
ZoneB TechnologyB 0 0 0 0 0 0 100 100 170 170 170

I know that SPARQL does not support displaying data directly in a tabular format like a spreadsheet, but I am trying to avoid any necessary post-processing in e.g. python as much as possible. I want to export the expected output above as a csv file.

I really have no idea how to do this with SPARQL. I have done some simple groupby/sum queries and separately also managed to filter on years based on a StartDate and EndDate, but I really don't have a clue how to do the combined set of this.

Can someone please help me out with this one?

答案1

得分: 2

这是你可以尝试的内容:

让我们以2005年为例

  • 在SELECT部分,你应该写一个SUM子句,像这样

(SUM(?temp2005) AS ?2005)

  • 在WHERE部分,你应该有

BIND(IF(你的条件, ?theCapacity, 0) AS ?temp2005)

---> 所以实际上这样做的是将包含容量的变量绑定到另一个变量,以便我们可以区分它们的每一年。然后,它将对与“?temp2005”相关联的所有变量求和,这在我们的情况下是“?2005”。

  • 注意!你需要在WHERE部分手动创建一个年份列表,使用VALUES子句,像这样:VALUES (?Year) {(2005) (2006) ... (2015) }

以下是你的示例的TTL代码:

@prefix ex: <http://example.com/data#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Subject1 a ex:Subject ;
  ex:startDate "2005-01-01"^^xsd:date ;
  ex:endDate "2025-01-01"^^xsd:date ;
  ex:location "ZoneA" ;
  ex:technology "TechnologyA" ;
  ex:value 100 .

ex:Subject2 a ex:Subject ;
  ex:startDate "2007-01-01"^^xsd:date ;
  ex:endDate "2025-01-01"^^xsd:date ;
  ex:location "ZoneB" ;
  ex:technology "TechnologyA" ;
  ex:value 50 .

ex:Subject3 a ex:Subject ;
  ex:startDate "2009-01-01"^^xsd:date ;
  ex:endDate "2025-01-01"^^xsd:date ;
  ex:location "ZoneA" ;
  ex:technology "TechnologyB" ;
  ex:value 80 .

ex:Subject4 a ex:Subject ;
  ex:startDate "2011-01-01"^^xsd:date ;
  ex:endDate "2025-01-01"^^xsd:date ;
  ex:location "ZoneB" ;
  ex:technology "TechnologyB" ;
  ex:value 100 .

ex:Subject5 a ex:Subject ;
  ex:startDate "2007-01-01"^^xsd:date ;
  ex:endDate "2025-01-01"^^xsd:date ;
  ex:location "ZoneA" ;
  ex:technology "TechnologyA" ;
  ex:value 20 .

ex:Subject6 a ex:Subject ;
  ex:startDate "2009-01-01"^^xsd:date ;
  ex:endDate "2025-01-01"^^xsd:date ;
  ex:location "ZoneB" ;
  ex:technology "TechnologyA" ;
  ex:value 10 .

ex:Subject7 a ex:Subject ;
  ex:startDate "2011-01-01"^^xsd:date ;
  ex:endDate "2025-01-01"^^xsd:date ;
  ex:location "ZoneA" ;
  ex:technology "TechnologyB" ;
  ex:value 60 .

ex:Subject8 a ex:Subject ;
  ex:startDate "2013-01-01"^^xsd:date ;
  ex:endDate "2025-01-01"^^xsd:date ;
  ex:location "ZoneB" ;
  ex:technology "TechnologyB" ;
  ex:value 70 .

结果

英文:

This is what you can try :

Let's take for example year 2005

  • In the SELECT section you should write a SUM clause like this one

    (SUM(?temp2005) AS ?2005)

  • In the WHERE section you should have

BIND(IF(your condition,?theCapacity, 0) AS ?temp2005)

---> So what this is actually doing is binding the variable containing the capacity to an another in a way that we can differentiate them for each year. It will then SUM all the variables associated to ``?temp2005 as a total which is in our case ?2005

  • NOTE! That you will need to create manually a list of years with the VALUES clause in your WHERE section `VALUES (?Year) {(2005) (2006) ... (2015) }

     PREFIX ex: &lt;http://example.com/data#&gt;
    PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;
    SELECT ?location ?technology
    (SUM(?temp2005) AS ?2005)
    (SUM(?temp2006) AS ?2006)     
    (SUM(?temp2007) AS ?2007)
    (SUM(?temp2008) AS ?2008)
    (SUM(?temp2009) AS ?2009)
    (SUM(?temp2010) AS ?2010) 
    (SUM(?temp2011) AS ?2011)
    (SUM(?temp2012) AS ?2012)
    (SUM(?temp2013) AS ?2013)
    (SUM(?temp2014) AS ?2014)
    (SUM(?temp2015) AS ?2015)
    WHERE {
    ?subject a ex:Subject ;
    ex:startDate ?startDate;
    ex:endDate ?endDate;
    ex:location ?location ;
    ex:technology ?technology ;
    ex:value ?value .
    #Create our Year value   
    BIND(xsd:gYear(?Year) AS ?Year)
    VALUES (?Year) { 
    (2005) (2006) (2007) (2008) (2009) (2010) (2011) (2012) (2013) (2014) (2015)
    }
    #EXTRACT the year component from the date and bind them
    BIND(YEAR(?startDate) AS ?startYear)
    BIND(YEAR(?endDate) AS ?endYear)
    BIND(IF((?Year = 2005)&amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2005) .
    BIND(IF((?Year = 2006) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2006) .
    BIND(IF((?Year = 2007) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2007) .
    BIND(IF((?Year = 2008) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2008) .
    BIND(IF((?Year = 2009) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2009) .
    BIND(IF((?Year = 2010) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2010) .
    BIND(IF((?Year = 2011) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2011) .
    BIND(IF((?Year = 2012) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2012) .
    BIND(IF((?Year = 2013) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2013) .
    BIND(IF((?Year = 2014) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2014) .
    BIND(IF((?Year = 2015) &amp;&amp;((?Year&gt;=?startYear) &amp;&amp; (?Year&lt;=?endYear)), ?value, 
    0) AS ?temp2015) .
    }
    GROUP BY ?location ?technology
    ORDER BY ?location ?technology
    

This how the ttl with your example looks like :

     @prefix ex: &lt;http://example.com/data#&gt; .
@prefix xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt; .
ex:Subject1 a ex:Subject ;
ex:startDate &quot;2005-01-01&quot;^^xsd:date ;
ex:endDate &quot;2025-01-01&quot;^^xsd:date ;
ex:location &quot;ZoneA&quot; ;
ex:technology &quot;TechnologyA&quot; ;
ex:value 100 .
ex:Subject2 a ex:Subject ;
ex:startDate &quot;2007-01-01&quot;^^xsd:date ;
ex:endDate &quot;2025-01-01&quot;^^xsd:date ;
ex:location &quot;ZoneB&quot; ;
ex:technology &quot;TechnologyA&quot; ;
ex:value 50 .
ex:Subject3 a ex:Subject ;
ex:startDate &quot;2009-01-01&quot;^^xsd:date ;
ex:endDate &quot;2025-01-01&quot;^^xsd:date ;
ex:location &quot;ZoneA&quot; ;
ex:technology &quot;TechnologyB&quot; ;
ex:value 80 .
ex:Subject4 a ex:Subject ;
ex:startDate &quot;2011-01-01&quot;^^xsd:date ;
ex:endDate &quot;2025-01-01&quot;^^xsd:date ;
ex:location &quot;ZoneB&quot; ;
ex:technology &quot;TechnologyB&quot; ;
ex:value 100 .
ex:Subject5 a ex:Subject ;
ex:startDate &quot;2007-01-01&quot;^^xsd:date ;
ex:endDate &quot;2025-01-01&quot;^^xsd:date ;
ex:location &quot;ZoneA&quot; ;
ex:technology &quot;TechnologyA&quot; ;
ex:value 20 .
ex:Subject6 a ex:Subject ;
ex:startDate &quot;2009-01-01&quot;^^xsd:date ;
ex:endDate &quot;2025-01-01&quot;^^xsd:date ;
ex:location &quot;ZoneB&quot; ;
ex:technology &quot;TechnologyA&quot; ;
ex:value 10 .
ex:Subject7 a ex:Subject ;
ex:startDate &quot;2011-01-01&quot;^^xsd:date ;
ex:endDate &quot;2025-01-01&quot;^^xsd:date ;
ex:location &quot;ZoneA&quot; ;
ex:technology &quot;TechnologyB&quot; ;
ex:value 60 .
ex:Subject8 a ex:Subject ;
ex:startDate &quot;2013-01-01&quot;^^xsd:date ;
ex:endDate &quot;2025-01-01&quot;^^xsd:date ;
ex:location &quot;ZoneB&quot; ;
ex:technology &quot;TechnologyB&quot; ;
ex:value 70 . 

Result

huangapple
  • 本文由 发表于 2023年7月31日 22:19:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76804507.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定