英文:
What's the best way to transform Google Sheets data to an array of objects?
问题
以下是翻译好的部分:
我正在将Google电子表格用作个人Web应用程序的简单“后端”。一切都运行得很顺利,但在开发此代码时,我对性能有一个疑虑。
这与我的应用程序关系不大,但我想了解如何以最佳方式实现我所做的事情,或者是否我确实以相当不错的方式来处理这个问题。
**问题:**
Google表格API以JSON格式返回类似以下内容:
"values": [
[
"name",
"country",
"age"
],
[
"John",
"Brazil",
"21"
],
[
"Jane",
"Canada",
"27"
],
]
但最终我更喜欢有这样的格式:
"values": [
{
"name": "John",
"country": "Brazil",
"age": "21",
},
{
"name": "Jane",
"country": "Canada",
"age": "27",
}
]
这是我用来进行转换的代码:
```javascript
const values = [
[
"name",
"country",
"age"
],
[
"John",
"Brazil",
"21"
],
[
"Jane",
"Canada",
"27"
],
];
const adjustValues = (array) => {
const indexes = array.shift();
const returnArray = [];
array.forEach((value, index) => {
const returnObject = {};
value.forEach((column, index) => {
returnObject[indexes[index]] = column;
});
returnArray.push(returnObject);
});
return returnArray;
}
console.log(adjustValues(values));
这是否是一种良好的数据转换方式?有什么更好的方法吗?
<details>
<summary>英文:</summary>
I am using Google Spreadsheets as a simple "backend" for a personal web application. Everything is working pretty well, but while developing this code, I had one doubt about performance.
This is not too much relevant to my application, but I wanted to understand what would be the best way to achieve what I did, or if I was actually doing this in a reasonably good fashion.
**The problem:**
Google sheets API returns me something like this as JSON:
"values": [
[
"name",
"country",
"age"
],
[
"John",
"Brazil",
"21"
],
[
"Jane",
"Canada",
"27"
],
]
But I would prefer to have something like this, in the end:
"values": [
{
"name": "John",
"country": "Brazil",
"age": "21",
},
{
"name": "Jane",
"country": "Canada",
"age": "27",
}
]
This is the code I'm using to make that transformation:
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-js -->
const values = [
[
"name",
"country",
"age"
],
[
"John",
"Brazil",
"21"
],
[
"Jane",
"Canada",
"27"
],
];
const adjustValues = (array) => {
const indexes = array.shift();
const returnArray = [];
array.forEach((value, index) => {
const returnObject = {};
value.forEach((column, index) => {
returnObject[indexes[index]] = column;
});
returnArray.push(returnObject);
});
return returnArray;
}
console.log(adjustValues(values));
<!-- end snippet -->
Is this a good way to transform that data? What would be the best way?
</details>
# 答案1
**得分**: 1
总的来说,我认为你的解决方案是有道理的。然而,我认为一个更适合JavaScript的方法是用`map`和`reduce`替代`forEach`。
以下是使用`map + reduce`替代`forEach`的函数版本:
```javascript
const adjustValues = (array) => {
const fieldNames = array[0];
const reducer = (soFar, value, fieldIndex) => {
soFar[fieldNames[fieldIndex]] = value;
return soFar;
};
return array.slice(1).map((row) => {
return row.reduce(reducer, {});
});
};
以下是我从原始函数中进行的一些更改:
- 使用
fieldNames
而不是indexes
,因为fieldNames
是更好的描述。 - 没有使用
Array.shift()
,因为这个函数会就地修改接收到的参数。尝试在调用你的函数之前和之后打印参数,你会注意到"headers"消失了。 - 利用了
map
和reduce
,因为它们明确显示了意图;要么是从一个数组中映射值到另一个数组,要么是从数组中减少值到不同的结构。 - 使用
row
而不是values
,以明确表明我正在从电子表格中读取一行数据。
我没有运行性能测试或基准测试。
总之,我认为你最好的行动是利用流。很可能你的电子表格阅读器正在接收一个流或CSV。如果你需要加载成千上万条记录,你可能会消耗大量内存,这可能会在存在并发操作的情况下导致应用程序崩溃。
基本上,你的处理过程需要的内存是原始内容的两倍。看:
- 你有10行(不包括标题)
- 在内存中有这些数据的同时,你开始创建一个将具有10行的新对象。
因此,请查看stream-csv-as-json
库。由于该库使用流,CSV内容将被逐行读取,并为每行创建一个新对象;有效减少了内存需求。
英文:
Overall, I think your solution makes sense.
However I believe a more Javascript-friendly approach would be to replace both forEach
by a map
and a reduce
.
Here's your function transformed to use map + reduce
instead of forEach
:
const adjustValues = (array) => {
const fieldNames = array[0];
const reducer = (soFar, value, fieldIndex) => {
soFar[fieldNames[fieldIndex]] = value;
return soFar
}
return array.slice(1).map((row) => {
return row.reduce(reducer, {})
});
}
Here are some items I've changed from the original function:
- Used
fieldNames
instead ofindexes
, asfieldNames
is a better description. - Did not use
Array.shift()
, because this function will modify in place the parameter that was received. Try printing the parameter before and after calling your function and you'll notice the "headers" disappeared. - Leveraged
map
andreduce
because they explicitly show what's the intention; either mapping values from an array into another array or reducing values from an array into a different structure. - Used
row
instead ofvalues
to make it clear that I'm reading a "row" from the spreadsheet.
I did not run any performance test or benchmark.
All this being said, I believe your best course of action would be to leverage streams. Most likely your spreadsheet reader is receiving a stream or a CSV. If you need to load thousands of records you might end up consuming an insane amount of memory, which could crash your application in case there are concurrent operations.
Basically your process requires twice as much space the original content is. See:
- You have 10 rows (excluding header)
- While this data is in memory, you start creating a new object that will have 10 more rows.
Due to this, give a look to the. stream-csv-as-json
library. Given the library uses a stream, the CSV content will be read "line by line" and a new object will be created for each line; effectively reducing the memory requirements.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论