如何在Drill中查询Parquet中的WKT列?

huangapple go评论61阅读模式
英文:

How to query a WKT column in parquet with Drill?

问题

I have a text WKT column of geospatial data in my parquet files I want to query in Apache Drill. I am running Drill version 1.21.1.

The parquet files have this layout (output from parquet-tools):

...
############ Column(wkt) ############
name: wkt
path: wkt
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: ZSTD (space_saved: 66%)
...

I get this error when using ST_AsGeoJSON:

apache drill> select ST_AsGeoJSON(wkt) from dfs.`/Users/matth/projects/parquet` limit 1;
Error: SYSTEM ERROR: GeometryException: invalid shape type

So I output the WKT and tried it as a string:

apache drill> select wkt from dfs.`/Users/matth/projects/ll/parquet` limit 1;
+----------------------------------------------------------------------------------+
|                                       wkt                                        |
+----------------------------------------------------------------------------------+
| POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509)) |
+----------------------------------------------------------------------------------+

The geometry is valid in Postgres:

select ST_IsValid(ST_GeomFromText('POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))'));

And it doesn't want to work in Drill:

apache drill> select ST_AsGeoJSON('POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))');
Error: SYSTEM ERROR: GeometryException: invalid shape type

Please, refer to logs for more information.

The error in the logs is (can post more; truncated here):

[Error Id: 1d5eed81-8eea-4e4a-96b3-9151a9282dc5 on siwenna:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: GeometryException: invalid shape type

Please, refer to logs for more information.

...
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Error while applying rule ReduceAndSimplifyProjectRule, args [rel#10013:LogicalProject.NONE.ANY([]).[](input=RelSubset#10012,exprs=[ST_ASGEOJSON('POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))'])]
	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:301)
	... 3 common frames omitted
Caused by: java.lang.RuntimeException: Error while applying rule ReduceAndSimplifyProjectRule, args [rel#10013:LogicalProject.NONE.ANY([]).[](input=RelSubset#10012,exprs=[ST_ASGEOJSON('POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))'])]
...
Caused by: com.esri.core.geometry.GeometryException: invalid shape type
	at com.esri.core.geometry.OperatorImportFromWkbLocal.importFromWkb(OperatorImportFromWkbLocal.java:366)
	at com.esri.core.geometry.OperatorImportFromWkbLocal.executeOGC(OperatorImportFromWkbLocal.java:183)
	at com.esri.core.geometry.ogc.OGCGeometry.fromBinary(OGCGeometry.java:597)
	at org.apache.drill.exec.udfs.gis.STAsGeoJSON.eval(STAsGeoJSON.java:51)
	at org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator.evaluateFunction(InterpreterEvaluator.java:149)
	at org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitFunctionHolderExpression(InterpreterEvaluator.java:363)
	... 26 common frames omitted
2023-05-17 10:17:47,507 [Client-1] INFO  o.a.d.e.r.u.BlockingResultsListener - [#53] Query failed: 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: GeometryException: invalid shape type
...

I tried a few other things, including ST_GeoFromText, where I got this error:

apache drill> select ST_GeoFromText(wkt) from dfs.`/Users/matth/projects/parquet`;
Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 26: No match found for function signature ST_GeoFromText(<ANY>)
英文:

I have a text WKT column of geospatial data in my parquet files I want to query in Apache Drill. I am running Drill version 1.21.1.

The parquet files have this layout (output from parquet-tools):

...
############ Column(wkt) ############
name: wkt
path: wkt
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: ZSTD (space_saved: 66%)
...

I get this error when using ST_AsGeoJSON:

apache drill&gt; select ST_AsGeoJSON(wkt) from dfs.`/Users/matth/projects/parquet` limit 1;
Error: SYSTEM ERROR: GeometryException: invalid shape type

So I output the WKT and tried it as a string:

apache drill&gt; select wkt from dfs.`/Users/matth/projects/ll/parquet` limit 1;
+----------------------------------------------------------------------------------+
|                                       wkt                                        |
+----------------------------------------------------------------------------------+
| POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509)) |
+----------------------------------------------------------------------------------+

The geometry is valid in Postgres:

select ST_IsValid(ST_GeomFromText(&#39;POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))&#39;));

And it doesn't want to work in Drill:

apache drill&gt; select ST_AsGeoJSON(&#39;POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))&#39;);
Error: SYSTEM ERROR: GeometryException: invalid shape type


Please, refer to logs for more information.

The error in the logs is (can post more; truncated here)

[Error Id: 1d5eed81-8eea-4e4a-96b3-9151a9282dc5 on siwenna:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: GeometryException: invalid shape type


Please, refer to logs for more information.

...
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Error while applying rule ReduceAndSimplifyProjectRule, args [rel#10013:LogicalProject.NONE.ANY([]).[](input=RelSubset#10012,exprs=[ST_ASGEOJSON(&#39;POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))&#39;)])]
	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:301)
	... 3 common frames omitted
Caused by: java.lang.RuntimeException: Error while applying rule ReduceAndSimplifyProjectRule, args [rel#10013:LogicalProject.NONE.ANY([]).[](input=RelSubset#10012,exprs=[ST_ASGEOJSON(&#39;POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))&#39;)])]
...
Caused by: com.esri.core.geometry.GeometryException: invalid shape type
	at com.esri.core.geometry.OperatorImportFromWkbLocal.importFromWkb(OperatorImportFromWkbLocal.java:366)
	at com.esri.core.geometry.OperatorImportFromWkbLocal.executeOGC(OperatorImportFromWkbLocal.java:183)
	at com.esri.core.geometry.ogc.OGCGeometry.fromBinary(OGCGeometry.java:597)
	at org.apache.drill.exec.udfs.gis.STAsGeoJSON.eval(STAsGeoJSON.java:51)
	at org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator.evaluateFunction(InterpreterEvaluator.java:149)
	at org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitFunctionHolderExpression(InterpreterEvaluator.java:363)
	... 26 common frames omitted
2023-05-17 10:17:47,507 [Client-1] INFO  o.a.d.e.r.u.BlockingResultsListener - [#53] Query failed: 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: GeometryException: invalid shape type
...

I tried a few other things, including ST_GeoFromText, where I got this error:

apache drill&gt; select ST_GeoFromText(wkt) from dfs.`/Users/matth/projects/parquet`;
Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 26: No match found for function signature ST_GeoFromText(&lt;ANY&gt;)

答案1

得分: 1

A few things:

  1. Drill没有ST_isValid()函数来验证几何图形,所以肯定不会起作用。

  2. 我使用以下查询将WKT转换为几何图形:

SELECT ST_AsJSON(
   ST_GeomFromText(
     'POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))'
  )
)

WKT文件中的内容是否可能出现格式问题,导致Drill无法处理?

供其他人参考,这是Drill关于GIS查询的直接链接:https://drill.apache.org/docs/gis-functions/

希望这有所帮助。

英文:

A few things:

  1. Drill doesn't have an ST_isValid() function for geometries so for sure that won't work.

  2. I got the conversion from WKT to geometry to work with the following query:

SELECT ST_AsJSON(
   ST_GeomFromText(
     &#39;POLYGON((-101.70891499999999 41.1128509,-101.7109674 41.112846499999996,-101.7109632 41.112755899999996,-101.70867039999999 41.112760699999995,-101.7086707 41.1128514,-101.70891499999999 41.1128509))&#39;
  )
)

Is it possible that the WKT in the parquet file is malformed somehow that Drill doesn't like it?

For anyone else's reference, here is a direct link to the Drill documentation on GIS queries: https://drill.apache.org/docs/gis-functions/

I hope this helps.

huangapple
  • 本文由 发表于 2023年5月17日 22:28:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76273213.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定