Java Spark – 如何从 JSON 对象生成 StructType

huangapple go评论86阅读模式
英文:

Java Spark - how to generate structType from a json object

问题

以下是翻译好的内容:

如何在Java中从JSON对象创建结构类型的struct?在我的情况下,JSON对象是AVRO模式(我在下面截断了它)。

{"type":"record","name":"DataRecord","namespace":"com.mycode","fields":[{"name":"data","type":{"type":"record","name":"Data",
"fields":[{"name":"COUNT","type":[{"type":"null"},{"type":"int"}],"default":null},{"name":"VALUE","type":[{"type":"null"},{"type":"int"}],"default":null}]}}]}

我更倾向于不手动创建StructType对象。我已经看到了在Scala中实现它的方法,但在Java中没有类似的方法。

英文:

How do i create a struct type out of a JSON object in java? The JSON object in my case is an AVRO schema(i have truncated it below).

{\"type\":\"record\",\"name\":\"DataRecord\",\"namespace\":\"com.mycode\",\"fields\":[{\"name\":\"data\",\"type\":{\"type\":\"record\",\"name\":\"Data\",
\"fields\":[{\"name\":\"COUNT\",\"type\":[{\"type\":\"null\"},{\"type\":\"int\"}],\"default\":null},{\"name\":\"VALUE\",\"type\":[{\"type\":\"null\"},{\"type\":\"int\"}],\"default\":null}] }

I prefer not to manually create the StructType object. I have seen ways to do it Scala, but nothing similar in Java.

答案1

得分: 1

import org.apache.avro.Schema
import org.apache.spark.sql.types.StructType
import com.databricks.spark.avro.SchemaConverters

val schemaStr = "{ \"type\" : \"record\", \"name\" : \"test_schema\", \"namespace\" : \"com.test.avro\", \"fields\" : [ { \"name\" : \"username\", \"type\" : \"string\", \"doc\"  : \"blah blah\" }, { \"name\" : \"tweet\", \"type\" : \"string\", \"doc\"  : \"test\" }, { \"name\" : \"timestamp\", \"type\" : \"long\", \"doc\"  : \"test\" } ], \"doc:\" : \"test\" }"
val schema = new Schema.Parser().parse(schemaStr)
val requiredType = SchemaConverters.toSqlType(schema).dataType.asInstanceOf[StructType]

注意:较新版本的Spark / Scala已默认包含了 com.databricks" %% "spark-avro"

英文:

With "org.apache.spark" %% "spark-core" % "2.4.5" , "com.databricks" %% "spark-avro" % "3.2.0"

I was able to convert Json String Schema to AVRO Schema then to Struct Type.

import org.apache.avro.Schema;
import org.apache.spark.sql.types.StructType;
import com.databricks.spark.avro.SchemaConverters;


String schemaStr ="{ \"type\" : \"record\", \"name\" : \"test_schema\", \"namespace\" : \"com.test.avro\", \"fields\" : [ { \"name\" : \"username\", \"type\" : \"string\", \"doc\"  : \"blah blah\" }, { \"name\" : \"tweet\", \"type\" : \"string\", \"doc\"  : \"test\" }, { \"name\" : \"timestamp\", \"type\" : \"long\", \"doc\"  : \"test\" } ], \"doc:\" : \"test\" }";
Schema schema = Schema.parse(schemaStr);    
StructType requiredType = (StructType) SchemaConverters.toSqlType(schema).dataType();

Note : The newer versions of Spark / Scala have included "com.databricks" %% "spark-avro" by default

huangapple
  • 本文由 发表于 2020年10月14日 15:01:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/64348144.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定