英文:
Java Spark - how to generate structType from a json object
问题
以下是翻译好的内容:
如何在Java中从JSON对象创建结构类型的struct?在我的情况下,JSON对象是AVRO模式(我在下面截断了它)。
{"type":"record","name":"DataRecord","namespace":"com.mycode","fields":[{"name":"data","type":{"type":"record","name":"Data",
"fields":[{"name":"COUNT","type":[{"type":"null"},{"type":"int"}],"default":null},{"name":"VALUE","type":[{"type":"null"},{"type":"int"}],"default":null}]}}]}
我更倾向于不手动创建StructType对象。我已经看到了在Scala中实现它的方法,但在Java中没有类似的方法。
英文:
How do i create a struct type out of a JSON object in java? The JSON object in my case is an AVRO schema(i have truncated it below).
{\"type\":\"record\",\"name\":\"DataRecord\",\"namespace\":\"com.mycode\",\"fields\":[{\"name\":\"data\",\"type\":{\"type\":\"record\",\"name\":\"Data\",
\"fields\":[{\"name\":\"COUNT\",\"type\":[{\"type\":\"null\"},{\"type\":\"int\"}],\"default\":null},{\"name\":\"VALUE\",\"type\":[{\"type\":\"null\"},{\"type\":\"int\"}],\"default\":null}] }
I prefer not to manually create the StructType object. I have seen ways to do it Scala, but nothing similar in Java.
答案1
得分: 1
import org.apache.avro.Schema
import org.apache.spark.sql.types.StructType
import com.databricks.spark.avro.SchemaConverters
val schemaStr = "{ \"type\" : \"record\", \"name\" : \"test_schema\", \"namespace\" : \"com.test.avro\", \"fields\" : [ { \"name\" : \"username\", \"type\" : \"string\", \"doc\" : \"blah blah\" }, { \"name\" : \"tweet\", \"type\" : \"string\", \"doc\" : \"test\" }, { \"name\" : \"timestamp\", \"type\" : \"long\", \"doc\" : \"test\" } ], \"doc:\" : \"test\" }"
val schema = new Schema.Parser().parse(schemaStr)
val requiredType = SchemaConverters.toSqlType(schema).dataType.asInstanceOf[StructType]
注意:较新版本的Spark / Scala已默认包含了 com.databricks" %% "spark-avro"
。
英文:
With "org.apache.spark" %% "spark-core" % "2.4.5"
, "com.databricks" %% "spark-avro" % "3.2.0"
I was able to convert Json String Schema to AVRO Schema then to Struct Type.
import org.apache.avro.Schema;
import org.apache.spark.sql.types.StructType;
import com.databricks.spark.avro.SchemaConverters;
String schemaStr ="{ \"type\" : \"record\", \"name\" : \"test_schema\", \"namespace\" : \"com.test.avro\", \"fields\" : [ { \"name\" : \"username\", \"type\" : \"string\", \"doc\" : \"blah blah\" }, { \"name\" : \"tweet\", \"type\" : \"string\", \"doc\" : \"test\" }, { \"name\" : \"timestamp\", \"type\" : \"long\", \"doc\" : \"test\" } ], \"doc:\" : \"test\" }";
Schema schema = Schema.parse(schemaStr);
StructType requiredType = (StructType) SchemaConverters.toSqlType(schema).dataType();
Note : The newer versions of Spark / Scala have included "com.databricks" %% "spark-avro"
by default
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论