英文:
SPARK: Parse JSON and get the Parent Column name when child column name is set to true
问题
unique_id | ignore |
[student_id] | [phone_number, e_mail] |
英文:
Given a below data to pass:
{
"properties": {
"student_id": {
"type": "string",
"unique_id": true
},
"status": {
"type": "boolean"
},
"name": {
"type": "string"
},
"phone_number": {
"type": "string",
"ignore": true
},
"e_mail": {
"type": "string",
"ignore": true
},
"address": {
"type": "string"
}
},
"subjects": [
"science",
"english"
]
}
I am specifically looking for child columns "unique_id" and "ignore" and want to get the below output:
unique_id | ignore |
----------------------------------------
[student_id] | [phone_number, e_mail] |
There parent column names are always changing.
The child columns like "unique_id" and "ignore" can be present for any of the parent columns.
I would like to know the parent column name for which the "unique_id" and "ignore" is true.
答案1
得分: 1
我使用json4s进行解析,类似这样的代码可以工作。
**导入和原始JSON**
```scala
import org.json4s.DefaultFormats
import org.json4s.{ JValue, JObject, JString, JBool, JField }
import org.json4s.jackson.JsonMethods.parse
val jsonString = "{\"properties\":{\"student_id\":{\"type\":\"string\",\"unique_id\":true},\"status\":{\"type\":\"boolean\"},\"name\":{\"type\":\"string\"},\"phone_number\":{\"type\":\"string\",\"ignore\":true},\"e_mail\":{\"type\":\"string\",\"ignore\":true},\"address\":{\"type\":\"string\"}},\"subjects\":[\"science\",\"english\"]}"
提取属性的函数,其中 key == true
/** @param key 当给定属性中存在 (key, true) 时,提取属性名称
* @param json 原始的JSON数据字符串
*/
def extractProperties(key: String, json: String): List[String] = {
// 解析字符串 => JValue
implicit val formats = DefaultFormats
val jval: JValue = parse(json)
// 从JSON中提取 "properties"
val properties: List[JField] = jval match {
case JObject(obj) => obj.filter(_._1 == "properties").head._2 match {
case JObject(obj2) => obj2
case _ => throw new IllegalArgumentException
}
case _ => throw new IllegalArgumentException
}
// 提取属性名称,其中 key == true
properties
.filter(property => property._2.asInstanceOf[JObject].obj contains (key,JBool(true)))
.map(_._1)
}
测试它
val unique: List[String] = extractProperties("unique_id", jsonString)
// unique: List[String] = List(student_id)
val ignored: List[String] = extractProperties("ignore", jsonString)
// ignored: List[String] = List(phone_number, e_mail)
英文:
I use json4s to parse, and something like this works.
imports & raw JSON
import org.json4s.DefaultFormats
import org.json4s.{ JValue, JObject, JString, JBool, JField }
import org.json4s.jackson.JsonMethods.parse
val jsonString = "{\"properties\":{\"student_id\":{\"type\":\"string\",\"unique_id\":true},\"status\":{\"type\":\"boolean\"},\"name\":{\"type\":\"string\"},\"phone_number\":{\"type\":\"string\",\"ignore\":true},\"e_mail\":{\"type\":\"string\",\"ignore\":true},\"address\":{\"type\":\"string\"}},\"subjects\":[\"science\",\"english\"]}"
function to extract properties where key == true
/** @param key where (key, true) exists for given properties, extract the property names
* @param json raw String of JSON data
*/
def extractProperties(key: String, json: String): List[String] = {
// parse String => JValue
implicit val formats = DefaultFormats
val jval: JValue = parse(json)
// extract "properties" from JSON
val properties: List[JField] = jval match {
case JObject(obj) => obj.filter(_._1 == "properties").head._2 match {
case JObject(obj2) => obj2
case _ => throw new IllegalArgumentException
}
case _ => throw new IllegalArgumentException
}
// extract property names where key == true
properties
.filter(property => property._2.asInstanceOf[JObject].obj contains (key,JBool(true)))
.map(_._1)
}
test it out
val unique: List[String] = extractProperties("unique_id", jsonString)
// unique: List[String] = List(student_id)
val ignored: List[String] = extractProperties("ignore", jsonString)
// ignored: List[String] = List(phone_number, e_mail)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论