将一篇文章的HTML标签使用Google表格中的App脚本分成不同的列。

huangapple go评论85阅读模式
英文:

Divided HTML tags of an article into separate columns using App script in Google sheet

问题

使用应用脚本,我需要拆分文章:

在第一列,这将是<h1>的内容,在第二列是第一个<h2>的内容,接下来的列是所有<p>的内容,然后在下一列是第二个<h2>的内容,其下面是所有的<p>,以此类推,直到第七个<h2>,文章仍然保留在最后一列。然后,在那一点上,我只想将剩下的文章转储到最后一列。所以,对于可能有多于七个<h2>的文章,它将把所有剩下的文章放在最后一列。

我正在使用以下应用脚本代码来完成任务。但是,这段代码有两个问题:

首先,这段代码不会将文章中最后一个<h2><h2>Conclusion</h2>)之后的段落<p>标签转储到最后一列。

其次,在某些文章上,脚本会出现以下错误。

TypeError: Cannot read properties of null (reading 'match')
(anonymous)	@ Code.gs:44
(anonymous)	@ Code.gs:43
myFunction	@ Code.gs:41

这里是包含文章和应用脚本的电子表格链接。在第3行,我想要的输出。在第4行,脚本输出的结果。第3行和第4行之间的唯一区别是第4行没有文章中最后一个<h2><h2>Conclusion</h2>)之后的段落<p>标签在最后一列中。

function onOpen() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var pasteSheet = [ {name: "Run Script", functionName: "myFunction"}];
  ss.addMenu("Break Down Article", pasteSheet);
}

function myFunction() {
  const sheetName = "Sheet1"; // 请设置工作表名称。

  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(sheetName);
  const range = sheet.getRange("A2:A" + sheet.getLastRow());
  const temp1 = range.getValues().map(([a]) =>
    a.split("\n").reduce((o, e, i, a) => {
      if (e) {
        let tagMatch = e.match(/^<(.*?)>/);
        let tag = tagMatch ? tagMatch[1].toUpperCase() : "";
        if (tag != "P") {
          if (o.temp.length > 0) {
            o.res.push(o.temp);
            o.temp = [];
          }
          o.res.push([e]);
        } else if (i == a.length - 1) {
          if (tag == "P") {
            o.temp.push(e);
            o.res.push(o.temp);
          } else {
            o.res.push([e]);
          }
        } else {
          o.temp.push(e);
        }
      }
      return o;
    }, { res: [], temp: [] }).res
  ).map(e => e.map(f => f.join("\n")));
  const maxLen = Math.max(...temp1.map(r => r.length));

  // 我修改了下面的脚本。
  const values = temp1.map(r => [...r, ...Array(maxLen - r.length).fill(null)]);
  const res = values.map(r => {
    let n = 0;
    const idx = r.map(e => {
      let tagMatch = e.match(/^<(.*?)>/);
      let tag = tagMatch ? tagMatch[1].toUpperCase() : "";
      return tag && tag.includes("H2") ? ++n : 0;
    }).indexOf(7);
    return [...r.splice(0, idx), r.join("\n")];
  });
  range.offset(0, 2, res.length, res[0].length).setValues(res);
}

只翻译了代码部分,不包括问题的其他内容。
<details>
<summary>英文:</summary>
Using the App Script, I need to break down the article: 
In the first column, this one would be whatever the ```&lt;h1&gt;``` is, in the second column whatever the first ```&lt;h2&gt;``` is, in the next column all the paragraphs ```&lt;p&gt;``` are going underneath it, then in the next whatever the second ```&lt;h2&gt;``` is, all the paragraphs ```&lt;p&gt;``` are going underneath that one and so on up until it gets to the seventh ```&lt;h2&gt;``` and the article still remains. And then at that point, I wanted just to dump the rest of the article into the last column. So, like for an article where we may have multiple ```&lt;h2&gt;``` more than seven, it would just put all remaining article in that last column.
I am using the below app script code to get the job done.
But **this code has two issues.** 
First, this code doesn&#39;t dump the last paragraphs ```&lt;p&gt;``` tags that come after the last ```&lt;h2&gt;``` (```&lt;h2&gt;```Conclusion```&lt;/h2&gt;```) of an article into the last column. 
Second, the script gives this error on some articles.

TypeError: Cannot read properties of null (reading 'match')
(anonymous) @ Code.gs:44
(anonymous) @ Code.gs:43
myFunction @ Code.gs:41

[Here][2] is the spreadsheet link which has the articles and the app script. In Row 3, What I want as output. In Row 4, what the script gives is output. The only difference between Row 3 and Row 4 is that Row 4 doesn&#39;t have the last paragraphs ```&lt;p&gt;``` tags that come after the last ```&lt;h2&gt;``` (```&lt;h2&gt;```Conclusion```&lt;/h2&gt;```) of an article in the last column.
```function onOpen() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var pasteSheet = [ {name: &quot;Run Script&quot;, functionName: &quot;myFunction&quot;}];
ss.addMenu(&quot;Break Down Article&quot;, pasteSheet);
}
function myFunction() {
const sheetName = &quot;Sheet1&quot;; // Please set the sheet name.
const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(sheetName);
const range = sheet.getRange(&quot;A2:A&quot; + sheet.getLastRow());
const temp1 = range.getValues().map(([a]) =&gt;
a.split(&quot;\n&quot;).reduce((o, e, i, a) =&gt; {
if (e) {
let tagMatch = e.match(/^&lt;(.*?)&gt;/);
let tag = tagMatch ? tagMatch[1].toUpperCase() : &quot;&quot;;
if (tag != &quot;P&quot;) {
if (o.temp.length &gt; 0) {
o.res.push(o.temp);
o.temp = [];
}
o.res.push([e]);
} else if (i == a.length - 1) {
if (tag == &quot;P&quot;) {
o.temp.push(e);
o.res.push(o.temp);
} else {
o.res.push([e]);
}
} else {
o.temp.push(e);
}
}
return o;
}, { res: [], temp: [] }).res
).map(e =&gt; e.map(f =&gt; f.join(&quot;\n&quot;)));
const maxLen = Math.max(...temp1.map(r =&gt; r.length));
// I modified below script.
const values = temp1.map(r =&gt; [...r, ...Array(maxLen - r.length).fill(null)]);
const res = values.map(r =&gt; {
let n = 0;
const idx = r.map(e =&gt; {
let tagMatch = e.match(/^&lt;(.*?)&gt;/);
let tag = tagMatch ? tagMatch[1].toUpperCase() : &quot;&quot;;
return tag &amp;&amp; tag.includes(&quot;H2&quot;) ? ++n : 0;
}).indexOf(7);
return [...r.splice(0, idx), r.join(&quot;\n&quot;)];
});
range.offset(0, 2, res.length, res[0].length).setValues(res);
} 

答案1

得分: 2

从您提供的示例电子表格中,以下是示例脚本的内容:

function myFunction() {
  const sheetName = "Sheet1"; // 请设置工作表名称。

  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(sheetName);
  const range = sheet.getRange("A3:A" + sheet.getLastRow());
  const temp1 = range.getValues().map(([a]) =>
    a.split("\n").reduce((o, e, i, a) => {
      let m = e.match(/^<(.*?)>/);
      let tag = "";
      if (m) {
        tag = m[1].toUpperCase();
      }
      if (tag != "P") {
        if (o.temp.length > 0) {
          o.res.push(o.temp);
          o.temp = [];
        }
        o.res.push([e]);
      } else if (i == a.length - 1) {
        if (tag == "P") {
          o.temp.push(e);
          o.res.push(o.temp);
        } else {
          o.res.push([e]);
        }
      } else {
        o.temp.push(e);
      }
      return o;
    }, { res: [], temp: [] }).res
  ).map(e => e.map(f => f.join("\n")));
  const res = temp1.map(r => {
    let n = 0;
    const idx = r.map(e => e ? (e.toUpperCase().includes("H2") ? ++n : 0) : 0).indexOf(7);
    return [...r.splice(0, idx > -1 ? idx : r.length), r.join("\n").trim()];
  });
  const maxLen = Math.max(...res.map(r => r.length));
  const values = res.map(r => [...r, ...Array(maxLen - r.length).fill(null)]);
  range.offset(0, 2, values.length, values[0].length).setValues(values);
}

注意:

  • 从 "我有5个示例值。暂时就这些了,没有其他值。" 这个样本脚本是针对您的5个示例值的。请注意这一点。
英文:

From your provided sample Spreadsheet, how about the following sample script?

From your sample values, I moved the adjustment of the length of each element in a 2-dimensional array after the array processing. And, I added a condition for checking the value from e.match(/^&lt;(.*?)&gt;/).

Sample script:

function myFunction() {
  const sheetName = &quot;Sheet1&quot;; // Please set the sheet name.

  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(sheetName);
  const range = sheet.getRange(&quot;A3:A&quot; + sheet.getLastRow());
  const temp1 = range.getValues().map(([a]) =&gt;
    a.split(&quot;\n&quot;).reduce((o, e, i, a) =&gt; {
      let m = e.match(/^&lt;(.*?)&gt;/);
      let tag = &quot;&quot;;
      if (m) {
        tag = m[1].toUpperCase();
      }
      if (tag != &quot;P&quot;) {
        if (o.temp.length &gt; 0) {
          o.res.push(o.temp);
          o.temp = [];
        }
        o.res.push([e]);
      } else if (i == a.length - 1) {
        if (tag == &quot;P&quot;) {
          o.temp.push(e);
          o.res.push(o.temp);
        } else {
          o.res.push([e]);
        }
      } else {
        o.temp.push(e);
      }
      return o;
    }, { res: [], temp: [] }).res
  ).map(e =&gt; e.map(f =&gt; f.join(&quot;\n&quot;)));
  const res = temp1.map(r =&gt; {
    let n = 0;
    const idx = r.map(e =&gt; e ? (e.toUpperCase().includes(&quot;H2&quot;) ? ++n : 0) : 0).indexOf(7);
    return [...r.splice(0, idx &gt; -1 ? idx : r.length), r.join(&quot;\n&quot;).trim()];
  });
  const maxLen = Math.max(...res.map(r =&gt; r.length));
  const values = res.map(r =&gt; [...r, ...Array(maxLen - r.length).fill(null)]);
  range.offset(0, 2, values.length, values[0].length).setValues(values);
}

Note:

  • From I have 5 sample values. That&#39;s all for now. I don&#39;t have any other., this sample script is for your 5 sample values. Please be careful about this.

huangapple
  • 本文由 发表于 2023年3月12日 19:21:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/75712754.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定