英文:
Expanding JSON to flat file in M Code(Power Query) for Notion Dashboards
问题
Disclaimer
我正在寻找一种在Mcode中展开JSON对象的方法。然而,我也足够谦虚,知道可能有更好的方法。考虑到这一点,我已经提供了背景信息,并且愿意考虑我是否完全走错了方向。
我知道这是一个很长的问题。我花了4小时准备了一个测试用的公共数据库,并清理了代码,以便其他人可以尽可能轻松地参与并提供帮助。感谢您花时间来阅读 🙂
背景
我喜欢Notion,但它缺乏强大的图形工具。我解决这个问题的方法是手动导出CSV文件,然后使用Excel和Power Query来转换数据,然后将其上传到 flourish.studio
总体目标
目标是在Notion中拥有我的数据,自动更新美观的图表,然后将其嵌入到Notion中。
虽然这个问题的主要目标不是这个,但我也愿意考虑其他解决方案。(我尝试过Google Looker Studio,但并不满意)
当前问题
Notion允许手动导出CSV。但是,这是手动的。
幸运的是,他们也有一个强大的API。不幸的是,它返回一个深层嵌套的JSON对象,Power Query很难将其展开为有用的CSV结构。
目前的进展
-
通过在Stack Overflow上进行研究和试错,我编写了一些M-Code,当提供数据库ID和API密钥时,它会获取JSON文件并将其引入到Power Query中。
QueryDatabaseFromNotion(databaseId as text, apiSecretKey as text)
-
我有一个自定义函数
ConvertJSONBranchToCSV(DataTable as table, ColumnName as text)
它将大多数属性类型转换为它们的 'raw value'。有2个主要限制
限制1: 已解决,必须逐个应用于每个列。我现在正在努力找到一种将其自动应用于每个列的方法。(我有6个数据库,每个数据库最多有80列)
限制2: 我不确定如何将ONE-TO-MANY属性转换为带逗号的字符串。 (请参阅我的代码中“TYPE 4. COMPLEX:”注释下面的部分)
问题总结
-
最终目标是在Notion中获得强大的图表功能。我所有的努力是否都朝着正确的方向?
-
假设这是一个有效的方法,我们应该继续努力将Notion完美地融入到Power Query中,您可以帮助我解决上述的限制1和限制2吗?
您可以在以下链接找到我的Excel文档、所有代码和示例数据库:
https://docs.google.com/spreadsheets/d/1U_ImHIp3h83oXKZoupKb9BhnsTQDBAf2/edit?usp=sharing&ouid=111108397944570955531&rtpof=true&sd=true
对于那些有兴趣快速查看我的代码的人,我已经在下面包含了它。
英文:
Disclaimer
I am looking for a way to flatten a JSON object in Mcode. However, I am also modest enough to know they may be a better way entirely. With that in mind, I have provided context and am open to the possibility that I am digging down the entirely wrong rabbit hole.
I know its a long question. I spent 4 hours preparing a test public database and cleaning up the code so it is as easy as possible for others to jump in and help. Thank you for taking the time 🙂
Background
I love Notion but it does lack powerful graphing tools. My workaround for this is manually exporting CSVs and using Excel and Power Query to transform the data before uploading it to flourish.studio
Overarching Goal
The goal is to have my data in notion, automatically updating beautiful diagrams that can then be embedded in Notion.
While not the main goal of this question I am also open to other solutions. (I have tried Google Looker Studio and was not impressed)
The current problem
Notion does allow for manual CSV export. However, this is manual.
Luckily they also have a powerful API. Unluckily it returns a deeply nested JSON object that Power Query really struggles to flatten into a useful CSV like structure.
Progress so far
-
Through research on stack overflow and trial and error, I have written some M-Code that when provided with a databaseId and APIKey grabs the JSON file and pulls it into Power Query.
QueryDatabaseFromNotion(databaseId as text, apiSecretKey as text)
-
I have a custom function
ConvertJSONBranchToCSV(DataTable as table, ColumnName as text)
That converts most type of properties to their 'raw value'. There are 2 main limitations
Limitation 1: SOLVED It must be applied individually to each column. I am now struggling to find a programmatic apply it to each column. (I have 6 databases, and each database has up to 80 columns)
Limitation 2: I am not sure how to convert ONE-TO-MANY properties to a string with commas. (See section under "TYPE 4. COMPLEX:" comment in my code)
Summary of Questions
-
The ultimate goal is to get powerful diagraming into notion. Am I going down the wrong path with all of this?
-
Assuming this is a valid path and we should continue trying to bring notion nicely into power query, can you help me with Limitations 1 and 2 above?
You can find my excel document all the code and sample database here:
https://docs.google.com/spreadsheets/d/1U_ImHIp3h83oXKZoupKb9BhnsTQDBAf2/edit?usp=sharing&ouid=111108397944570955531&rtpof=true&sd=true
For those interested in quickly looking over my code I have included it below
<!-- begin snippet: js hide: true console: true babel: false -->
<!-- language: lang-html -->
let
getRawValueFromJsonTree = (DataTable as table, ColumnName as text) =>
let
#"output" =
let
// =========== STEP 1. Dig one level deeper if property is 'rollup' or 'formula' =====================================================================================================================
// Columns which are 'rollup' or 'formula' can then be any other type.
// So this step is to dig one layer deep, so the rest of the code can run as normal.
// E.g. For 'number' properities the path = results{0}.properties.Title Column Name.number
// E.g Form 'formula' properties that result in a number the path = results{0}.properties.Title Column Name.formula.number
intialTypeStr =
try
// This assumes that the datatype of the first record is the same for all records.
// TODO This is not true for icon and cover.
Table.ExpandRecordColumn(DataTable, ColumnName, {"type"})[type]{0}
catch(err) =>
if err[Message] <> null then "Error: Already Raw Data" else "Pass(This is never shown)",
typeStr =
if intialTypeStr = "rollup" then
// Note that Rollup Calculations will all return nulls if your secretAPIKey does not also have access to that database
Table.ExpandRecordColumn(DataTable, ColumnName, {"rollup"})[rollup]{0}[type]
else if intialTypeStr = "formula" then
Table.ExpandRecordColumn(DataTable, ColumnName, {"formula"})[formula]{0}[type]
else if intialTypeStr = null then
"Error: No Type Property"
else
intialTypeStr,
#"BaseLevel" =
if intialTypeStr = "formula" then
Table.ExpandRecordColumn(DataTable, ColumnName, {"formula"}, {ColumnName})
else if intialTypeStr = "rollup" then
Table.ExpandRecordColumn(DataTable, ColumnName, {"rollup"}, {ColumnName})
else
DataTable
in
// =========== STEP 2. Extract the 'Raw value' from the JSON tree ======================================================================================================================================
if typeStr = "Error: Already Raw Data" then
// =========== TYPE 1. EASY: Already raw data. No chage require ==================================================================================================================================
#"BaseLevel" // Column is already a raw type. No changes required
else if typeStr = "Error: No Type Property" then
// This occurs with properties that all databases have. E.g. "created_by" (Fun fact the are actually not stored under 'properties' in the JSON)
Table.ExpandRecordColumn(DataTable, ColumnName, {"id"}, {ColumnName})
// =========== TYPE 2. EASY: Retrieving the value of property types that are ONLY 1 LAYER DEEP ====================================================================================================
// First lets get the easy wins.
// E.g. For 'number' properities the path = results{0}.properties.Title Column Name.number
// E.g. For 'string' properities the path = results{0}.properties.Title Column Name.string
else if
typeStr = "checkbox"
or typeStr = "boolean" // This seems to only happen when a formula results in a checkbox
or typeStr = "string"
or typeStr = "email"
or typeStr = "number"
or typeStr = "phone_number"
or typeStr = "url"
or typeStr = "created_time"
or typeStr = "last_edited_time"
or typeStr = "database_id"
then
Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName})
// ELSE
// =========== TYPE 3. MEDIUM: Retrieving the value of property types that have UNIQUE PATHS =======================================================================================================
// Next we do the more complicated ones with multiple layers
// E.g. For 'title' properities the path = results{0}.properties.Title Column Name.title{0}.plain_text
else if typeStr = "title" then
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandListColumn(#"A", ColumnName),
#"C" = Table.ExpandRecordColumn(#"B", ColumnName, {"plain_text"}, {ColumnName})
in
#"C"
else if
typeStr = "status"
or typeStr = "select"
then
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandRecordColumn(#"A", ColumnName, {"name"}, {ColumnName})
in
#"B"
else if typeStr = "unique_id" then
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandRecordColumn(#"A", ColumnName, {"number"}, {ColumnName})
in
#"B"
else if typeStr = "rich_text" then
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandListColumn(#"A", ColumnName),
#"C" = Table.ExpandRecordColumn(#"B", ColumnName, {"plain_text"}, {ColumnName})
in
#"C"
else if typeStr = "date" then
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"START" = Table.ExpandRecordColumn(#"A", ColumnName, {"start"}, {ColumnName}),
#"END" = Table.ExpandRecordColumn(#"A", ColumnName, {"end"}, {ColumnName}),
// TODO Note that if start or end does not include time, then the date is not in the standard format yyyy-mm-ddThh:mm:ss.mmmm+{timezone offset}
//#"B" = Text.Combine({Text.From(#"START"), Text.From(#"END")}, " to ") // TODO I am currently just ignoring END
#"B" = #"START"
in
#"B"
else if
typeStr = "created_by"
or typeStr = "last_edited_by"
then
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandRecordColumn(#"A", ColumnName, {"id"}, {ColumnName})
in
#"B"
// =========== TYPE 4. COMPLEX: Retrieving the value of property types that are ONE-TO-MANY =========================================================================================================
else if typeStr = "array" then
// TRODO: Clean this up, Rollups with arrays can have confusing paths. Overlap with "title" property type above
// results{0}.properties.Title Column Name.rollup.array.{[{List_Object!}]}.title}.{[{List_Object!}]}.plain_text
// results{0}.properties.Title Column Name.rollup.array.{[{List_Object!}]}.{GetType()}.{[{List_Object!}]}.plain_text
// results{0}.properties.Title Column Name.rollup.array.{[{List_Object!}]}.{GetType()}.{[{List_Object!}]}.{GetType()}.content
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandListColumn(#"A", ColumnName), // NOT USED. This creates multuple lines and that is bad.
//#"C" = Table.ExpandRecordColumn(#"B", ColumnName, {"id"}, {ColumnName})
//#"C" = Table.ExpandRecordColumn(#"B", ColumnName, {"plain_text"}, {ColumnName})
rollupTypeStr = Table.ExpandRecordColumn(#"B", ColumnName, {"type"}){0}[type],
#"DigBasedOnRollupType" = if rollupTypeStr = "title" then
// For Rollup type = show_original
let
#"W" = #"B",
#"X" = Table.ExpandRecordColumn(#"W", ColumnName, {rollupTypeStr}, {ColumnName}),
#"Y" = Table.ExpandListColumn(#"X", ColumnName), // NOT USED. This creates multuple lines and that is bad.
#"Z" = Table.ExpandRecordColumn(#"Y", ColumnName, {"plain_text"}, {ColumnName})
in
//#"Z"
#"X"
else if rollupTypeStr = null then
// For Rollups Type = Sum
Table.ExpandRecordColumn(#"B", ColumnName, {"id"}, {ColumnName})
else
//#"B"
rollupTypeStr
in
//#"DigBasedOnRollupType"
#"A"
else if typeStr = "relation" then
// TODO: this has the 'has_more' property. I suspect this means there is some pagination that needs to be included in the future when there are more then 100 relations. - And I know there will be ;(
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandListColumn(#"A", ColumnName), // NOT USED. This creates multuple lines and that is bad.
#"C" = Table.ExpandRecordColumn(#"B", ColumnName, {"id"}, {ColumnName})
in
//#"C"
#"A"
else if typeStr = "files" then
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandListColumn(#"A", ColumnName) // NOT USED. This creates multuple lines and that is bad.
in
#"A"
else if typeStr = "multi_select" then
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandListColumn(#"A", ColumnName) // NOT USED. This creates multuple lines and that is bad.
in
#"A"
else if typeStr = "relation" then
#"BaseLevel" // I havent had any luck yet
else if typeStr = "people" then
let
#"A" = Table.ExpandRecordColumn(#"BaseLevel", ColumnName, {typeStr}, {ColumnName}),
#"B" = Table.ExpandListColumn(#"A", ColumnName) // NOT USED. This creates multuple lines and that is bad.
in
#"A"
else
#"BaseLevel" // TODO: If a column type is not recognised then I should show a error to the user. But not break everything
in
#"output"
in
getRawValueFromJsonTree
<!-- end snippet -->
You can take a look at the JSON object returned by querying a notion database here: JSON returned from Notion API:
答案1
得分: 1
With regard to Limitation 1, you can use List.Accumulate
to cycle through the relevant columns.
- 创建要转换的列的列表
- 有很多方法可以做到这一点
- 我只是在你的
#"Removed other columns
步骤中排除了最后一列
- 然后使用如下所示的
List.Accumulate
let
Source = #"Get Notion Database in JSON",
#"Removed Other Columns" = Table.SelectColumns(Source,
{"Person ColumnName", "Relations ColumnName", "Rollup Sum ColumnName", "Date ColumnName", "Formula Boolean ColumnName", "Formula Number ColumnName", "Email ColumnName", "Sequential ID ColumnName", "Status ColumnName", "Rollup Original ColumnName", "Checkbox ColumnName", "Formula Text ColumnName", "Created by ColumnName", "Multi-select ColumnName", "Formula Date ColumnName", "Last edited time ColumnName", "Number ColumnName", "Select ColumnName", "Phone ColumnName", "URL ColumnName", "Files & media ColumnName", "Rollup Latest Date ColumnName", "Text ColumnName", "Rollup Count ColumnName", "Created time ColumnName", "Last edited by ColumnName", "Title ColumnName", "url"}),
// 创建要转换的列的列表
#"Convert Columns" = List.RemoveLastN(Table.ColumnNames(#"Removed Other Columns"),1),
// 使用 List.Accumulate 展开所有这些列
#"Convert JSON" =
List.Accumulate(
#"Convert Columns",
#"Removed Other Columns",
(state, current) => ConvertJSONBranchToCSV(state, current))
in
#"Convert JSON"
英文:
With regard to Limitation 1, you can use List.Accumulate
to cycle through the relevant columns.
- Create a List of the columns to convert
- Many ways to do this
- I just excluded the last column in your
#"Removed other columns
step
- Then use
List.Accumulate
as shown below
let
Source = #"Get Notion Database in JSON",
#"Removed Other Columns" = Table.SelectColumns(Source,
{"Person ColumnName", "Relations ColumnName", "Rollup Sum ColumnName", "Date ColumnName", "Formula Boolean ColumnName", "Formula Number ColumnName", "Email ColumnName", "Sequential ID ColumnName", "Status ColumnName", "Rollup Original ColumnName", "Checkbox ColumnName", "Formula Text ColumnName", "Created by ColumnName", "Multi-select ColumnName", "Formula Date ColumnName", "Last edited time ColumnName", "Number ColumnName", "Select ColumnName", "Phone ColumnName", "URL ColumnName", "Files & media ColumnName", "Rollup Latest Date ColumnName", "Text ColumnName", "Rollup Count ColumnName", "Created time ColumnName", "Last edited by ColumnName", "Title ColumnName", "url"}),
//create list of columns to convert
#"Convert Columns" = List.RemoveLastN(Table.ColumnNames(#"Removed Other Columns"),1),
//expand all of those columns using List.Accumulate
#"Convert JSON" =
List.Accumulate(
#"Convert Columns",
#"Removed Other Columns",
(state, current) => ConvertJSONBranchToCSV(state, current))
in
#"Convert JSON"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论