MongoDB v5… ingest of public business data. I need to do a titleCase() of the $name field. Works great to about 400,000 records, then errors

huangapple go评论97阅读模式
英文:

MongoDB v5... ingest of public business data. I need to do a titleCase() of the $name field. Works great to about 400,000 records, then errors

问题

I end up having anywhere from 1.5Million to 3Million documents in the collection. Ingest is from public Government CSV data aggregated to gov_data.businesses collection. Everything is ALLCAPS. I aggregated the data to a new collection with the address.city and name fields $toLower. Now I need to titleCase those fields. using address.city instead of name in the following code takes a while (28 minutes), but succeeds. name however fails with TypeError: Cannot read properties of undefined (reading 'toUpperCase') after some 400,000 documents at about 8 (minutes). Feels like a data size issue, but I've no idea. I'm relatively new to aggregations and coding in mongo/mongosh.

I borrowed the script from here: https://stackoverflow.com/questions/63113037/how-to-update-field-value-to-tittlecase-in-mongodb

  1. use gov_data
  2. function titleCase(str) {
  3. return str && str.toLowerCase().split(/\s/).map(function(word) {
  4. return word.replace(word[0], word[0].toUpperCase());
  5. }).join(' ');
  6. }
  7. console.log(titleCase(undefined));
  8. console.log(titleCase(""));
  9. console.log(titleCase(null));
  10. console.log(titleCase("NAMAR"));
  11. db.businesses.aggregate().forEach(function(doc){
  12. db.businesses.bulkWrite(
  13. { "_id": doc._id },
  14. { "$set": { "name": titleCase(doc.name) } }
  15. );
  16. });

Please note that the provided code contains JavaScript, and the code itself does not need translation.

英文:

I end up having anywhere from 1.5Million to 3Million documents in the collection. Ingest is from public Government CSV data aggregated to gov_data.businesses collection. Everything is ALLCAPS. I aggregated the data to a new collection with the address.city and name fields $toLower. Now I need to titleCase those fields. using address.city instead of name in the following code takes a while (28 minutes), but succeeds. name however fails with TypeError: Cannot read properties of undefined (reading 'toUpperCase') after some 400,000 documents at about 8 (minutes). Feels like a data size issue, but I've no idea. I'm relatively new to aggregations and coding in mongo/mongosh.

I borrowed the script from here: https://stackoverflow.com/questions/63113037/how-to-update-field-value-to-tittlecase-in-mongodb

  1. use gov_data
  2. function titleCase(str) {
  3. return str && str.toLowerCase().split(/\s/).map(function(word) {
  4. return word.replace(word[0], word[0].toUpperCase());
  5. }).join(' ');
  6. }
  7. console.log(titleCase(undefined));
  8. console.log(titleCase(""));
  9. console.log(titleCase(null));
  10. console.log(titleCase("NAMAR"));
  11. db.businesses.aggregate().forEach(function(doc){
  12. db.businesses.bulkWrite(
  13. { "_id": doc._id },
  14. { "$set": { "name": titleCase(doc.name) } }
  15. );
  16. });

答案1

得分: 1

你可能有一个以空格或连续空格开头的文档。split函数将解析为一个数组,如 [ '', 'toto' ],空字符没有 toUpperCase 函数。

你应该更新你的 titleCase 函数来解决这个问题。你可以按照以下方式操作(注意它会移除额外的空格):

  1. function titleCase(str) {
  2. return str && str.toLowerCase().split(/\s/).reduce(function(element, word) {
  3. if (word.length > 0) {
  4. element.push(word.replace(word[0], word[0].toUpperCase()));
  5. }
  6. return element;
  7. }, []).join(' ');
  8. }

这应该解决你的问题。

此外,你应该考虑使用 updateMany 函数来更新所有文档,而不是使用 forEach 进行迭代。

  1. db.businesses.updateMany(
  2. {},
  3. [{ "$set": { "name": titleCase("$name") } }]
  4. );
英文:

You might have a document starting with a space or with consecutive space in it. The split function will resolve to an array like [ '', 'toto' ] and the empty char has no toUpperCase function.

You should update your titleCase function to fix this.
You could do as follows (warning it will remove the extra spaces):

  1. function titleCase(str) {
  2. return str && str.toLowerCase().split(/\s/).reduce(function(element,word) {
  3. if (word.length>0){
  4. element.push(word.replace(word[0], word[0].toUpperCase()));
  5. }
  6. return element;
  7. }, []
  8. ).join(' ');
  9. }

This should fix your issue.

On top of that, you should consider using the updateMany function to update all document instead of iterating with forEach.

  1. db.businesses.updateMany(
  2. {},
  3. [{ "$set": { "name": titleCase("$name") } }]
  4. );
  5. });

huangapple
  • 本文由 发表于 2023年7月18日 03:21:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76707521.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定