英文:
MongoDB v5... ingest of public business data. I need to do a titleCase() of the $name field. Works great to about 400,000 records, then errors
问题
I end up having anywhere from 1.5Million to 3Million documents in the collection. Ingest is from public Government CSV data aggregated to gov_data.businesses
collection. Everything is ALLCAPS. I aggregated the data to a new collection with the address.city
and name
fields $toLower. Now I need to titleCase those fields. using address.city
instead of name
in the following code takes a while (28 minutes), but succeeds. name
however fails with TypeError: Cannot read properties of undefined (reading 'toUpperCase')
after some 400,000 documents at about 8 (minutes). Feels like a data size issue, but I've no idea. I'm relatively new to aggregations and coding in mongo/mongosh.
I borrowed the script from here: https://stackoverflow.com/questions/63113037/how-to-update-field-value-to-tittlecase-in-mongodb
use gov_data
function titleCase(str) {
return str && str.toLowerCase().split(/\s/).map(function(word) {
return word.replace(word[0], word[0].toUpperCase());
}).join(' ');
}
console.log(titleCase(undefined));
console.log(titleCase(""));
console.log(titleCase(null));
console.log(titleCase("NAMAR"));
db.businesses.aggregate().forEach(function(doc){
db.businesses.bulkWrite(
{ "_id": doc._id },
{ "$set": { "name": titleCase(doc.name) } }
);
});
Please note that the provided code contains JavaScript, and the code itself does not need translation.
英文:
I end up having anywhere from 1.5Million to 3Million documents in the collection. Ingest is from public Government CSV data aggregated to gov_data.businesses
collection. Everything is ALLCAPS. I aggregated the data to a new collection with the address.city
and name
fields $toLower. Now I need to titleCase those fields. using address.city
instead of name
in the following code takes a while (28 minutes), but succeeds. name
however fails with TypeError: Cannot read properties of undefined (reading 'toUpperCase')
after some 400,000 documents at about 8 (minutes). Feels like a data size issue, but I've no idea. I'm relatively new to aggregations and coding in mongo/mongosh.
I borrowed the script from here: https://stackoverflow.com/questions/63113037/how-to-update-field-value-to-tittlecase-in-mongodb
use gov_data
function titleCase(str) {
return str && str.toLowerCase().split(/\s/).map(function(word) {
return word.replace(word[0], word[0].toUpperCase());
}).join(' ');
}
console.log(titleCase(undefined));
console.log(titleCase(""));
console.log(titleCase(null));
console.log(titleCase("NAMAR"));
db.businesses.aggregate().forEach(function(doc){
db.businesses.bulkWrite(
{ "_id": doc._id },
{ "$set": { "name": titleCase(doc.name) } }
);
});
答案1
得分: 1
你可能有一个以空格或连续空格开头的文档。split函数将解析为一个数组,如 [ '', 'toto' ],空字符没有 toUpperCase 函数。
你应该更新你的 titleCase 函数来解决这个问题。你可以按照以下方式操作(注意它会移除额外的空格):
function titleCase(str) {
return str && str.toLowerCase().split(/\s/).reduce(function(element, word) {
if (word.length > 0) {
element.push(word.replace(word[0], word[0].toUpperCase()));
}
return element;
}, []).join(' ');
}
这应该解决你的问题。
此外,你应该考虑使用 updateMany 函数来更新所有文档,而不是使用 forEach 进行迭代。
db.businesses.updateMany(
{},
[{ "$set": { "name": titleCase("$name") } }]
);
英文:
You might have a document starting with a space or with consecutive space in it. The split function will resolve to an array like [ '', 'toto' ] and the empty char has no toUpperCase function.
You should update your titleCase function to fix this.
You could do as follows (warning it will remove the extra spaces):
function titleCase(str) {
return str && str.toLowerCase().split(/\s/).reduce(function(element,word) {
if (word.length>0){
element.push(word.replace(word[0], word[0].toUpperCase()));
}
return element;
}, []
).join(' ');
}
This should fix your issue.
On top of that, you should consider using the updateMany function to update all document instead of iterating with forEach.
db.businesses.updateMany(
{},
[{ "$set": { "name": titleCase("$name") } }]
);
});
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论