使用Java语言在Spark中读取一个二进制列。

huangapple go评论81阅读模式
英文:

Read a binary column in spark using java language

问题

I have a DataFrame which contains a Binary column Type.

DataFrame:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|BinaryGeometry                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[00 00 00 00 01 03 00 00 00 01 00 00 00 11 00 00 00 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40 FC FF EF 00 68 AA 1A C0 BF EE 57 20 85 19 41 40 04 00 F0 00 8C 86 1A C0 CC DC 8B DC AE 1A 41 40 FF FF EF 00 44 74 1A C0 CA 9D 5D 61 10 1C 41 40 FF FF EF 00 64 63 1A C0 BF 1F 98 0B 3A 1D 41 40 FF FF EF 00 44 47 1A C0 E4 6B A0 DD CE 1D 41 40 FC FF EF 00 D8 2B 1A C0 54 E4 71 67 6D 1C 41 40 FF FF EF 00 44 1A 1A C0 BF 1F 98 0B 3A 1D 41 40 02 00 F0 00 80 0B 1A C0 0D 80 00 13 2F 23 41 40 02 00 F0 00 B0 35 1A C0 CC F6 23 F8 BD 26 41 40 04 00 F0 00 0C 43 1A C0 73 1A 44 AF 16 26 41 40 02 00 F0 00 40 5A 1A C0 FF 54 9C 7C 2D 27 41 40 02 00 F0 00 50 68 1A C0 87 6E B9 42 44 28 41 40 02 00 F0 00 00 7C 1A C0 78 2B 85 BA F5 26 41 40 FC FF EF 00 18 91 1A C0 49 96 6F 58 C6 28 41 40 02 00 F0 00 B0 BC 1A C0 91 FA 4B 0E 7F 20 41 40 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40] |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I'm trying to read this column to extract the geometry format.

From my research, I found that there is a function from the GeoSpark library called ST_GeomFromWKB which takes a long binary as an argument.

So I'm doing the following code:

df.withColumn("BinaryGeometry", hex(col("BinaryGeometry")))
  .withColumn("BinaryGeometry", expr("ST_GeomFromWKB(BinaryGeometry)"))

I get the following output which is not correct:

POINT(0 0)

How can I read this column to get the correct geometry value?

EDIT

The structure of the column in MySQL:
使用Java语言在Spark中读取一个二进制列。

And when I track the data, it seems like this:
使用Java语言在Spark中读取一个二进制列。

When I click on [GEOMETRY - 113 o], a text file downloads.

The text file contains this data:

                 ے FPہ>Pے{ح@@€àOہZDچ4ح@@‏ہ¨KہإضآTح@@‏àمJہî=Oƒح@@û ڑLہCBsn«ح@@ے FPہ>Pے{ح@@

EDIT2

I have this table in the MySQL database:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|BinaryGeometry
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|'POLYGON((-6.70005799736828 34.2118684058451,-6.66641236748546 34.1993751935347,-6.63139344658703 34.208461349766,-6.6135406633839 34.2192498881446,-6.5970611711964 34.2283339016717,-6.5695953508839 34.2328755410488,-6.54281617607921 34.2220887476075,-6.5256500383839 34.2283339016717,-6.51123048271984 34.2748740913813,-6.55242921318859 34.3026724029165,-6.56547547783703 34.2975672800575,-6.58813477959484 34.3060756457653,-6.60186768975109 34.314583149474,-6.62109376396984 34.3043740415805,-6.64169312920421 34.318553022848,-6.68426515068859 34.2538774367323,-6.70005799736828 34.2118684058451))',0
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


When I load this table in Spark, I get:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|BinaryGeometry
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[00 00 00 00 01 03 00 00 00 01 00 00 00 11 00 00 00 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40 FC FF EF 00 68 AA 1A C0 BF EE 57 20 85 19 41 40 04 00 F0 00 8C 86 1A C0 CC DC 8B DC AE 1A 41 40 FF FF EF 00 44 74 1A C0 CA 9D 5D 61 10 1C 41 40 FF FF EF 00 64 63 1A C0 BF 1F 98 0B 3A 1D

英文:

I have a DataFrame witch contains a Binary column Type.

DataFrame :

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|BinaryGeometry                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[00 00 00 00 01 03 00 00 00 01 00 00 00 11 00 00 00 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40 FC FF EF 00 68 AA 1A C0 BF EE 57 20 85 19 41 40 04 00 F0 00 8C 86 1A C0 CC DC 8B DC AE 1A 41 40 FF FF EF 00 44 74 1A C0 CA 9D 5D 61 10 1C 41 40 FF FF EF 00 64 63 1A C0 BF 1F 98 0B 3A 1D 41 40 FF FF EF 00 44 47 1A C0 E4 6B A0 DD CE 1D 41 40 FC FF EF 00 D8 2B 1A C0 54 E4 71 67 6D 1C 41 40 FF FF EF 00 44 1A 1A C0 BF 1F 98 0B 3A 1D 41 40 02 00 F0 00 80 0B 1A C0 0D 80 00 13 2F 23 41 40 02 00 F0 00 B0 35 1A C0 CC F6 23 F8 BD 26 41 40 04 00 F0 00 0C 43 1A C0 73 1A 44 AF 16 26 41 40 02 00 F0 00 40 5A 1A C0 FF 54 9C 7C 2D 27 41 40 02 00 F0 00 50 68 1A C0 87 6E B9 42 44 28 41 40 02 00 F0 00 00 7C 1A C0 78 2B 85 BA F5 26 41 40 FC FF EF 00 18 91 1A C0 49 96 6F 58 C6 28 41 40 02 00 F0 00 B0 BC 1A C0 91 FA 4B 0E 7F 20 41 40 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40] |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I'm triying to read this column to extract the geometry format.

from my research i found that there is a function from the geospark library which is ST_GeomFromWKB which takes a long binary as argument.

SO I'm doing the following code :

df.withColumn("BinaryGeometry", hex(col("BinaryGeometry")))
                    .withColumn("BinaryGeometry",expr("ST_GeomFromWKB(BinaryGeometry)"))

I get the following output witch is not correct :

POINT(0 0)

How can I read this column to get the correct geometry value ?

EDIT

The structute of column on MySQL
使用Java语言在Spark中读取一个二进制列。

And when I track the data it seems like that :
使用Java语言在Spark中读取一个二进制列。

When I click on
[GEOMETRY - 113 o]

a text file downloads.

The text file contains this data :

             ے FPہ>Pے{ح@@€àOہZDچ4ح@@‏ہ¨KہإضآTح@@‏àمJہî=Oƒح@@û ڑLہCBsn«ح@@ے FPہ>Pے{ح@@

EDIT2

I have this table on MySQL database

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|BinaryGeometry
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|'POLYGON((-6.70005799736828 34.2118684058451,-6.66641236748546 34.1993751935347,-6.63139344658703 34.208461349766,-6.6135406633839 34.2192498881446,-6.5970611711964 34.2283339016717,-6.5695953508839 34.2328755410488,-6.54281617607921 34.2220887476075,-6.5256500383839 34.2283339016717,-6.51123048271984 34.2748740913813,-6.55242921318859 34.3026724029165,-6.56547547783703 34.2975672800575,-6.58813477959484 34.3060756457653,-6.60186768975109 34.314583149474,-6.62109376396984 34.3043740415805,-6.64169312920421 34.318553022848,-6.68426515068859 34.2538774367323,-6.70005799736828 34.2118684058451))',0
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

When I load this table in Spark I get :

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|BinaryGeometry                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[00 00 00 00 01 03 00 00 00 01 00 00 00 11 00 00 00 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40 FC FF EF 00 68 AA 1A C0 BF EE 57 20 85 19 41 40 04 00 F0 00 8C 86 1A C0 CC DC 8B DC AE 1A 41 40 FF FF EF 00 44 74 1A C0 CA 9D 5D 61 10 1C 41 40 FF FF EF 00 64 63 1A C0 BF 1F 98 0B 3A 1D 41 40 FF FF EF 00 44 47 1A C0 E4 6B A0 DD CE 1D 41 40 FC FF EF 00 D8 2B 1A C0 54 E4 71 67 6D 1C 41 40 FF FF EF 00 44 1A 1A C0 BF 1F 98 0B 3A 1D 41 40 02 00 F0 00 80 0B 1A C0 0D 80 00 13 2F 23 41 40 02 00 F0 00 B0 35 1A C0 CC F6 23 F8 BD 26 41 40 04 00 F0 00 0C 43 1A C0 73 1A 44 AF 16 26 41 40 02 00 F0 00 40 5A 1A C0 FF 54 9C 7C 2D 27 41 40 02 00 F0 00 50 68 1A C0 87 6E B9 42 44 28 41 40 02 00 F0 00 00 7C 1A C0 78 2B 85 BA F5 26 41 40 FC FF EF 00 18 91 1A C0 49 96 6F 58 C6 28 41 40 02 00 F0 00 B0 BC 1A C0 91 FA 4B 0E 7F 20 41 40 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40] |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

then to get the original data 'Polygon .........' I do the following code :

df.withColumn("geom",expr("ST_GeomFromWKB(BinaryGeometry)"));

But I get the following error :

20/08/10 22:28:50 ERROR Executor: Exception in task 87.0 in stage 39.0 (TID 929) java.lang.ClassCastException: [B cannot be cast to org.apache.spark.unsafe.types.UTF8String
at org.apache.spark.sql.geosparksql.expressions.ST_GeomFromWKB.eval(Constructors.scala:174)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_39$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)

EDIT3
使用Java语言在Spark中读取一个二进制列。

答案1

得分: 4

MySQL的几何类型并不真正存储为WKB,因此不能使用geospark的方法ST_GeomFromWKB来读取。

https://dev.mysql.com/doc/refman/5.7/en/gis-data-formats.html

在内部,MySQL以一种不与WKTWKB格式完全相同的格式存储几何值。(内部格式类似于WKB,但带有用于指示SRID的初始4个字节。)

解决方法是在加载数据框时指定查询,将geometry类型解析为WKT格式,使用内置函数ST_AsWKT

df = spark
  .read
  .format("jdbc")
  .options(
    Map(
      "driver" -> "com.mysql.cj.jdbc.Driver",
      "url" -> "jdbc:mysql://host:3306/db",
      "user" -> "user",
      "password" -> "password",
      "dbtable" -> "(select ST_AsWKT(BinaryGeometry) as BinaryGeometry from geo_table) as t")
  )
  .load
  .withColumn("BinaryGeometry",expr("ST_GeomFromWKT(BinaryGeometry)"))

或者,可以提取列并将其作为WKB使用:

select hex(ST_AsWKB(BinaryGeometry)) as BinaryGeometry from geo_table

然后将其传递给ST_GeomFromWKB方法。

scala> df.printShema()
root
 |-- BinaryGeometry: geometry (nullable = false)

scala> df.show()
+--------------------+
|      BinaryGeometry|
+--------------------+
|POLYGON ((-7.5783...|
+--------------------+
英文:

MySQL geometry types are not really stored as WKB and as such cannot be read using the geospark's method ST_GeomFromWKB.

https://dev.mysql.com/doc/refman/5.7/en/gis-data-formats.html
> Internally, MySQL stores geometry values in a format that is not identical to either WKT or WKB format. (Internal format is like WKB but with an initial 4 bytes to indicate the SRID.)

Solution to that is to specify the query when loading dataframe that will parse the geometry types into WKT format using the builtin function ST_AsWKT:

df = spark
  .read
  .format("jdbc")
  .options(
    Map(
      "driver" -> "com.mysql.cj.jdbc.Driver",
      "url" -> "jdbc:mysql://host:3306/db",
      "user" -> "user",
      "password" -> "password",
      "dbtable" -> "(select ST_AsWKT(BinaryGeometry) as BinaryGeometry from geo_table) as t")
  )
  .load
  .withColumn("BinaryGeometry",expr("ST_GeomFromWKT(BinaryGeometry)"))

Alternatively, the column can be extracted as WKB using

select hex(ST_AsWKB(BinaryGeometry)) as BinaryGeometry from geo_table

and fed to ST_GeomFromWKB method.

scala> df.printShema()
root
 |-- BinaryGeometry: geometry (nullable = false)

scala> df.show()
+--------------------+
|      BinaryGeometry|
+--------------------+
|POLYGON ((-7.5783...|
+--------------------+

huangapple
  • 本文由 发表于 2020年8月6日 01:26:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/63270475.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定