SPARQL查询时的命名空间冲突

huangapple go评论87阅读模式
英文:

SPARQL Namespace conflict while querying

问题

I'm quite puzzled by the behavior of my endpoint, nor by the processing of the request. The basic RDFS namespace seems to clash with another definition while querying, resulting in an error when declaring the prefix and a normal output when omitting the prefix in the body.

Setup

Query 1:

SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 1:

INFO:root:                                             sub                   p         
0  http://example.org/triples/17bbab96          Pont d Iéna-9423efbc
1  http://example.org/triples/37d3fba1          Pont d Iéna-9423efbc
2  http://example.org/triples/e8a8921a          Pont Transbordeur-fb62b01e
3  http://example.org/triples/7907d1de          Pont Transbordeur-fb62b01e
4  http://example.org/triples/5b529b5e          Pont d Iéna-98cdd2fc

Query 2:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 2 (Client Side):

(...)
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
(...)
ValueError: You did something wrong formulating either the URI or your SPARQL query

Output 2 (Server Side):

[INFO ] 2023-07-13 08:50:13,797 [repositories/astra1 | c.o.f.s.GraphDBProtocolExceptionResolver] X-Request-Id: 712a09f4-626e-5f2a-b22b-5d436e2c4ae2 Client sent bad request (400)
org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix 'rdfs'

Querying using rdflib on graphdb endpoint (rdf4j)

import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

from dotenv import load_dotenv
load_dotenv()

from rdflib import Graph
from rdflib.plugins.stores import sparqlstore
from rdflib.plugins.sparql.processor import SPARQLResult

from requests.auth import HTTPDigestAuth

from pandas import DataFrame

def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
    """
    Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
    using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
    """
    return DataFrame(
        data=([None if x is None else x.toPython() for x in row] for row in results),
        columns=[str(x) for x in results.vars],
    )


if __name__ == '__main__':
  store = sparqlstore.SPARQLUpdateStore(query_endpoint=os.environ['SPARQL_ENDPOINT_QUERY'], update_endpoint=os.environ['SPARQL_ENDPOINT_UPDATE']) #,
                          # auth=HTTPDigestAuth(config.AUTH_USER, config.AUTH_PASS), context_aware=True,

  g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI']) # namespace_manager=None

  q_sa ="""
    select * where { 
      ?s ?p ?o .
    } limit 20 
    """
  
  q_sa2 = """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT *
    WHERE {
      ?sub rdfs:label ?p .
    } LIMIT 20
  """
  
  qr = g.query(q_sa2)
  df = sparql_results_to_df(qr)
  logging.info(df)
  

Expectation

I'd have expected the opposite, the Query 1 failing while raising "Undefined Prefix Error" and Query 2 retrieving my results.
Is there a way to have such a behavior by modifying something client side or server side ? Is this a bad idea ? (I prefer to have everything in the queries, even the most basic namespaces)

I'd be glad to read your thought on that.
Thanks in advance for your answers !

英文:

I'm quite puzzled by the behavior of my endpoint, nor by the processing of the request. The basic RDFS namespace seems to clash with another definition while querying, resulting in an error when declaring the prefix and a normal output when omitting the prefix in the body.

Setup

Query 1:

SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 1:

INFO:root:                                             sub                   p         
0  http://example.org/triples/17bbab96          Pont d Iéna-9423efbc
1  http://example.org/triples/37d3fba1          Pont d Iéna-9423efbc
2  http://example.org/triples/e8a8921a          Pont Transbordeur-fb62b01e
3  http://example.org/triples/7907d1de          Pont Transbordeur-fb62b01e
4  http://example.org/triples/5b529b5e          Pont d Iéna-98cdd2fc

Query 2:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 2 (Client Side):

(...)
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
(...)
ValueError: You did something wrong formulating either the URI or your SPARQL query

Output 2 (Server Side):

[INFO ] 2023-07-13 08:50:13,797 [repositories/astra1 | c.o.f.s.GraphDBProtocolExceptionResolver] X-Request-Id: 712a09f4-626e-5f2a-b22b-5d436e2c4ae2 Client sent bad request (400)
org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix 'rdfs'

Querying using rdflib on graphdb endpoint (rdf4j)

import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

from dotenv import load_dotenv
load_dotenv()

from rdflib import Graph
from rdflib.plugins.stores import sparqlstore
from rdflib.plugins.sparql.processor import SPARQLResult

from requests.auth import HTTPDigestAuth

from pandas import DataFrame

def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
    """
    Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
    using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
    """
    return DataFrame(
        data=([None if x is None else x.toPython() for x in row] for row in results),
        columns=[str(x) for x in results.vars],
    )


if __name__ == '__main__':
  store = sparqlstore.SPARQLUpdateStore(query_endpoint=os.environ['SPARQL_ENDPOINT_QUERY'], update_endpoint=os.environ['SPARQL_ENDPOINT_UPDATE']) #,
                          # auth=HTTPDigestAuth(config.AUTH_USER, config.AUTH_PASS), context_aware=True,

  g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI']) # namespace_manager=None

  q_sa ="""
    select * where { 
      ?s ?p ?o .
    } limit 20 
    """
  
  q_sa2 = """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT *
    WHERE {
      ?sub rdfs:label ?p .
    } LIMIT 20
  """
  
  qr = g.query(q_sa2)
  df = sparql_results_to_df(qr)
  logging.info(df)
  

Expectation

I'd have expected the opposite, the Query 1 failing while raising "Undefined Prefix Error" and Query 2 retrieving my results.
Is there a way to have such a behavior by modifying something client side or server side ? Is this a bad idea ? (I prefer to have everything in the queries, even the most basic namespaces)

I'd be glad to read your thought on that.
Thanks in advance for your answers !

答案1

得分: 2

谢谢 @UninformedUser,你让我找对了方向!很难确定错误是在哪里触发的(rdflib的图?sparqlstore?端点配置?)

不幸的是,空的 initNs 不起作用,因为在源代码中被默认图命名空间覆盖了:initNs = initNs or dict(self.namespaces()) # noqa: N806

查看RDFLIB文档中的命名空间绑定,每个图都附带了默认的命名空间。

然后,解决方案是覆盖默认图配置:g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI'], bind_namespaces="none")

问题解决了!(将在2天内标记它)

英文:

Thanks @UninformedUser, you've put me on the right track ! Hard to figure where the error fired (rdflib's graph ? sparqlstore ? endpoint config ?)

Alas, empty initNs doesn't work as in the source it is overriden with the default graph namespace : initNs = initNs or dict(self.namespaces()) # noqa: N806

Looking at Namespace bindings from RDFLIB docs, each graph is shipped with default namespaces.

Then, solution is to override default graph config : g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI'], bind_namespaces="none")

Solved ! (will mark it in 2 days)

huangapple
  • 本文由 发表于 2023年7月13日 17:04:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76677670.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定