英文:
SPARQL Namespace conflict while querying
问题
I'm quite puzzled by the behavior of my endpoint, nor by the processing of the request. The basic RDFS namespace seems to clash with another definition while querying, resulting in an error when declaring the prefix and a normal output when omitting the prefix in the body.
Setup
Query 1:
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 5
Output 1:
INFO:root: sub p
0 http://example.org/triples/17bbab96 Pont d Iéna-9423efbc
1 http://example.org/triples/37d3fba1 Pont d Iéna-9423efbc
2 http://example.org/triples/e8a8921a Pont Transbordeur-fb62b01e
3 http://example.org/triples/7907d1de Pont Transbordeur-fb62b01e
4 http://example.org/triples/5b529b5e Pont d Iéna-98cdd2fc
Query 2:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 5
Output 2 (Client Side):
(...)
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
(...)
ValueError: You did something wrong formulating either the URI or your SPARQL query
Output 2 (Server Side):
[INFO ] 2023-07-13 08:50:13,797 [repositories/astra1 | c.o.f.s.GraphDBProtocolExceptionResolver] X-Request-Id: 712a09f4-626e-5f2a-b22b-5d436e2c4ae2 Client sent bad request (400)
org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix 'rdfs'
Querying using rdflib on graphdb endpoint (rdf4j)
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
from dotenv import load_dotenv
load_dotenv()
from rdflib import Graph
from rdflib.plugins.stores import sparqlstore
from rdflib.plugins.sparql.processor import SPARQLResult
from requests.auth import HTTPDigestAuth
from pandas import DataFrame
def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
"""
Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
"""
return DataFrame(
data=([None if x is None else x.toPython() for x in row] for row in results),
columns=[str(x) for x in results.vars],
)
if __name__ == '__main__':
store = sparqlstore.SPARQLUpdateStore(query_endpoint=os.environ['SPARQL_ENDPOINT_QUERY'], update_endpoint=os.environ['SPARQL_ENDPOINT_UPDATE']) #,
# auth=HTTPDigestAuth(config.AUTH_USER, config.AUTH_PASS), context_aware=True,
g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI']) # namespace_manager=None
q_sa ="""
select * where {
?s ?p ?o .
} limit 20
"""
q_sa2 = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 20
"""
qr = g.query(q_sa2)
df = sparql_results_to_df(qr)
logging.info(df)
Expectation
I'd have expected the opposite, the Query 1 failing while raising "Undefined Prefix Error" and Query 2 retrieving my results.
Is there a way to have such a behavior by modifying something client side or server side ? Is this a bad idea ? (I prefer to have everything in the queries, even the most basic namespaces)
I'd be glad to read your thought on that.
Thanks in advance for your answers !
英文:
I'm quite puzzled by the behavior of my endpoint, nor by the processing of the request. The basic RDFS namespace seems to clash with another definition while querying, resulting in an error when declaring the prefix and a normal output when omitting the prefix in the body.
Setup
Query 1:
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 5
Output 1:
INFO:root: sub p
0 http://example.org/triples/17bbab96 Pont d Iéna-9423efbc
1 http://example.org/triples/37d3fba1 Pont d Iéna-9423efbc
2 http://example.org/triples/e8a8921a Pont Transbordeur-fb62b01e
3 http://example.org/triples/7907d1de Pont Transbordeur-fb62b01e
4 http://example.org/triples/5b529b5e Pont d Iéna-98cdd2fc
Query 2:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 5
Output 2 (Client Side):
(...)
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
(...)
ValueError: You did something wrong formulating either the URI or your SPARQL query
Output 2 (Server Side):
[INFO ] 2023-07-13 08:50:13,797 [repositories/astra1 | c.o.f.s.GraphDBProtocolExceptionResolver] X-Request-Id: 712a09f4-626e-5f2a-b22b-5d436e2c4ae2 Client sent bad request (400)
org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix 'rdfs'
Querying using rdflib on graphdb endpoint (rdf4j)
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
from dotenv import load_dotenv
load_dotenv()
from rdflib import Graph
from rdflib.plugins.stores import sparqlstore
from rdflib.plugins.sparql.processor import SPARQLResult
from requests.auth import HTTPDigestAuth
from pandas import DataFrame
def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
"""
Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
"""
return DataFrame(
data=([None if x is None else x.toPython() for x in row] for row in results),
columns=[str(x) for x in results.vars],
)
if __name__ == '__main__':
store = sparqlstore.SPARQLUpdateStore(query_endpoint=os.environ['SPARQL_ENDPOINT_QUERY'], update_endpoint=os.environ['SPARQL_ENDPOINT_UPDATE']) #,
# auth=HTTPDigestAuth(config.AUTH_USER, config.AUTH_PASS), context_aware=True,
g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI']) # namespace_manager=None
q_sa ="""
select * where {
?s ?p ?o .
} limit 20
"""
q_sa2 = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 20
"""
qr = g.query(q_sa2)
df = sparql_results_to_df(qr)
logging.info(df)
Expectation
I'd have expected the opposite, the Query 1 failing while raising "Undefined Prefix Error" and Query 2 retrieving my results.
Is there a way to have such a behavior by modifying something client side or server side ? Is this a bad idea ? (I prefer to have everything in the queries, even the most basic namespaces)
I'd be glad to read your thought on that.
Thanks in advance for your answers !
答案1
得分: 2
谢谢 @UninformedUser,你让我找对了方向!很难确定错误是在哪里触发的(rdflib的图?sparqlstore?端点配置?)
不幸的是,空的 initNs
不起作用,因为在源代码中被默认图命名空间覆盖了:initNs = initNs or dict(self.namespaces()) # noqa: N806
查看RDFLIB文档中的命名空间绑定,每个图都附带了默认的命名空间。
然后,解决方案是覆盖默认图配置:g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI'], bind_namespaces="none")
问题解决了!(将在2天内标记它)
英文:
Thanks @UninformedUser, you've put me on the right track ! Hard to figure where the error fired (rdflib's graph ? sparqlstore ? endpoint config ?)
Alas, empty initNs
doesn't work as in the source it is overriden with the default graph namespace : initNs = initNs or dict(self.namespaces()) # noqa: N806
Looking at Namespace bindings from RDFLIB docs, each graph is shipped with default namespaces.
Then, solution is to override default graph config : g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI'], bind_namespaces="none")
Solved ! (will mark it in 2 days)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论