2023年7月13日 17:04:39go评论87阅读模式

英文:

SPARQL Namespace conflict while querying

问题

I'm quite puzzled by the behavior of my endpoint, nor by the processing of the request. The basic RDFS namespace seems to clash with another definition while querying, resulting in an error when declaring the prefix and a normal output when omitting the prefix in the body.

Setup

Query 1:

SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 1:

INFO:root:                                             sub                   p         
0  http://example.org/triples/17bbab96          Pont d I&#233;na-9423efbc
1  http://example.org/triples/37d3fba1          Pont d I&#233;na-9423efbc
2  http://example.org/triples/e8a8921a          Pont Transbordeur-fb62b01e
3  http://example.org/triples/7907d1de          Pont Transbordeur-fb62b01e
4  http://example.org/triples/5b529b5e          Pont d I&#233;na-98cdd2fc

Query 2:

PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 2 (Client Side):

(...)
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
(...)
ValueError: You did something wrong formulating either the URI or your SPARQL query

Output 2 (Server Side):

[INFO ] 2023-07-13 08:50:13,797 [repositories/astra1 | c.o.f.s.GraphDBProtocolExceptionResolver] X-Request-Id: 712a09f4-626e-5f2a-b22b-5d436e2c4ae2 Client sent bad request (400)
org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix &#39;rdfs&#39;

Querying using rdflib on graphdb endpoint (rdf4j)

import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

from dotenv import load_dotenv
load_dotenv()

from rdflib import Graph
from rdflib.plugins.stores import sparqlstore
from rdflib.plugins.sparql.processor import SPARQLResult

from requests.auth import HTTPDigestAuth

from pandas import DataFrame

def sparql_results_to_df(results: SPARQLResult) -&gt; DataFrame:
    &quot;&quot;&quot;
    Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
    using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
    &quot;&quot;&quot;
    return DataFrame(
        data=([None if x is None else x.toPython() for x in row] for row in results),
        columns=[str(x) for x in results.vars],
    )


if __name__ == &#39;__main__&#39;:
  store = sparqlstore.SPARQLUpdateStore(query_endpoint=os.environ[&#39;SPARQL_ENDPOINT_QUERY&#39;], update_endpoint=os.environ[&#39;SPARQL_ENDPOINT_UPDATE&#39;]) #,
                          # auth=HTTPDigestAuth(config.AUTH_USER, config.AUTH_PASS), context_aware=True,

  g = Graph(store=store, identifier=os.environ[&#39;SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI&#39;]) # namespace_manager=None

  q_sa =&quot;&quot;&quot;
    select * where { 
      ?s ?p ?o .
    } limit 20 
    &quot;&quot;&quot;
  
  q_sa2 = &quot;&quot;&quot;
    PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
    SELECT *
    WHERE {
      ?sub rdfs:label ?p .
    } LIMIT 20
  &quot;&quot;&quot;
  
  qr = g.query(q_sa2)
  df = sparql_results_to_df(qr)
  logging.info(df)

Expectation

I'd have expected the opposite, the Query 1 failing while raising "Undefined Prefix Error" and Query 2 retrieving my results.
Is there a way to have such a behavior by modifying something client side or server side ? Is this a bad idea ? (I prefer to have everything in the queries, even the most basic namespaces)

I'd be glad to read your thought on that.
Thanks in advance for your answers !

英文:

Setup

Query 1:

SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 1:

INFO:root:                                             sub                   p         
0  http://example.org/triples/17bbab96          Pont d I&#233;na-9423efbc
1  http://example.org/triples/37d3fba1          Pont d I&#233;na-9423efbc
2  http://example.org/triples/e8a8921a          Pont Transbordeur-fb62b01e
3  http://example.org/triples/7907d1de          Pont Transbordeur-fb62b01e
4  http://example.org/triples/5b529b5e          Pont d I&#233;na-98cdd2fc

Query 2:

PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 2 (Client Side):

(...)
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
(...)
ValueError: You did something wrong formulating either the URI or your SPARQL query

Output 2 (Server Side):

[INFO ] 2023-07-13 08:50:13,797 [repositories/astra1 | c.o.f.s.GraphDBProtocolExceptionResolver] X-Request-Id: 712a09f4-626e-5f2a-b22b-5d436e2c4ae2 Client sent bad request (400)
org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix &#39;rdfs&#39;

Querying using rdflib on graphdb endpoint (rdf4j)

import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

from dotenv import load_dotenv
load_dotenv()

from rdflib import Graph
from rdflib.plugins.stores import sparqlstore
from rdflib.plugins.sparql.processor import SPARQLResult

from requests.auth import HTTPDigestAuth

from pandas import DataFrame

def sparql_results_to_df(results: SPARQLResult) -&gt; DataFrame:
    &quot;&quot;&quot;
    Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
    using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
    &quot;&quot;&quot;
    return DataFrame(
        data=([None if x is None else x.toPython() for x in row] for row in results),
        columns=[str(x) for x in results.vars],
    )


if __name__ == &#39;__main__&#39;:
  store = sparqlstore.SPARQLUpdateStore(query_endpoint=os.environ[&#39;SPARQL_ENDPOINT_QUERY&#39;], update_endpoint=os.environ[&#39;SPARQL_ENDPOINT_UPDATE&#39;]) #,
                          # auth=HTTPDigestAuth(config.AUTH_USER, config.AUTH_PASS), context_aware=True,

  g = Graph(store=store, identifier=os.environ[&#39;SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI&#39;]) # namespace_manager=None

  q_sa =&quot;&quot;&quot;
    select * where { 
      ?s ?p ?o .
    } limit 20 
    &quot;&quot;&quot;
  
  q_sa2 = &quot;&quot;&quot;
    PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
    SELECT *
    WHERE {
      ?sub rdfs:label ?p .
    } LIMIT 20
  &quot;&quot;&quot;
  
  qr = g.query(q_sa2)
  df = sparql_results_to_df(qr)
  logging.info(df)

Expectation

I'd be glad to read your thought on that.
Thanks in advance for your answers !

答案1

得分: 2

谢谢 @UninformedUser，你让我找对了方向！很难确定错误是在哪里触发的（rdflib的图？sparqlstore？端点配置？）

不幸的是，空的 initNs 不起作用，因为在源代码中被默认图命名空间覆盖了：initNs = initNs or dict(self.namespaces()) # noqa: N806

查看RDFLIB文档中的命名空间绑定，每个图都附带了默认的命名空间。

然后，解决方案是覆盖默认图配置：g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI'], bind_namespaces="none")

问题解决了！（将在2天内标记它）

英文:

Thanks @UninformedUser, you've put me on the right track ! Hard to figure where the error fired (rdflib's graph ? sparqlstore ? endpoint config ?)

Alas, empty initNs doesn't work as in the source it is overriden with the default graph namespace : initNs = initNs or dict(self.namespaces()) # noqa: N806

Looking at Namespace bindings from RDFLIB docs, each graph is shipped with default namespaces.

Then, solution is to override default graph config : g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI'], bind_namespaces="none")

Solved ! (will mark it in 2 days)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

SPARQL查询时的命名空间冲突

问题

Setup

Query 1:

Query 2:

Querying using rdflib on graphdb endpoint (rdf4j)

Expectation

Setup

Query 1:

Query 2:

Querying using rdflib on graphdb endpoint (rdf4j)

Expectation

答案1

“GraphDB Workbench与SPARQLWrapper中构建查询的结果不同”

使用Java/Jena API打开文件。

如何将SPARQL查询结果分享为数据源

How to update RDF graph by instantiating the variables existing in triples with values?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论