如何在Google Apps Script中使用Cheerio将文本转换为数组

huangapple go评论71阅读模式
英文:

How to get the every text to array with cheerio in Google Apps Script

问题

我想将每个文本内容存入数组中。但有些 div 元素的类名中包含了 " " 或 "(",因此我无法使用 find() 方法。那么,我该如何将它们全部存入数组中呢?感谢您的任何帮助。

HTML 源代码:

<div id="main-0-QuoteHeader-Proxy">
    <div class="Mb($m-module-24)">
        <div class="D(f) Ai(c) Mb(6px)">
            <h1 class="C($c-link-text) Fw(b) Fz(24px) Mend(8px)">1.name</h1>
            <span class="C($c-icon) Fz(24px) Mend(20px)">2.id</span>
            <div class="Flxg(2)">
                <a class="Td(n) Px(8px) Py(3px) Fz(12px) Fw(b) Bdrs(11px) C(#188fff) Bgc($tag-bg-blue) Bgc($tag-bg-blue-hover):h" href="/class-quote?sectorId=40&amp;exchange=TAI">3.cag</a>
            </div>
            <button class="Ff(buttonFont) O(n) Bd Bxsh(n) Trsdu(.3s) Whs(nw) C(#fff) Bdc($c-button) Bgc($c-button) Cur(p) D(f) Ai(c) Fx(n) Bdrs(100px) Px(15px) Py(3px) Lh(20px) Fz(14px) Fw(b) Mstart(8px) Bdw(2px)">
                <svg class="Cur(p)" width="16" style="fill:#fff;stroke:#fff;stroke-width:0;vertical-align:bottom" height="16" viewBox="0 0 24 24" data-icon="star"><path d="M8.485 7.83l-6.515.21c-.887.028-1.3 1.117-.66 1.732l4.99 4.78-1.414 6.124c-.2 1.14.767 1.49 1.262 1.254l5.87-3.22 5.788 3.22c.48.228 1.464-.097 1.26-1.254l-1.33-6.124 4.962-4.78c.642-.615.228-1.704-.658-1.732l-6.486-.21-2.618-6.22c-.347-.815-1.496-.813-1.84.003L8.486 7.83zm7.06 6.05l1.11 5.11-4.63-2.576L7.33 18.99l1.177-5.103-4.088-3.91 5.41-.18 2.19-5.216 2.19 5.216 5.395.18-4.06 3.903z"></path></svg>
                <span class="Mstart(8px)">4.add</span>
            </button>
        </div>
        <div class="D(f) Jc(sb) Ai(fe)">
            <div class="D(f) Fld(c) Ai(fs)">
                <div class="D(f) Ai(fe) Mb(4px)">
                    <span class="Fz(32px) Fw(b) Lh(1) Mend(16px) D(f) Ai(c) C($c-trend-down)">5.num</span>
                    <span class="Fz(20px) Fw(b) Lh(1.2) Mend(4px) D(f) Ai(c) C($c-trend-down)">
                        <span class="Mend(4px) Bds(s)" style="border-color:#00ab5e transparent transparent transparent;border-width:9px 6.5px 0 6.5px"></span>
                        6.up
                    </span>
                    <span class="Jc(fe) Fz(20px) Lh(1.2) Fw(b) D(f) Ai(c) C($c-trend-down)">7.down</span>
                </div>
                <span class="C(#6e7780) Fz(12px) Fw(b)">8.2023/02/17 13:30</span>
            </div>
            <div class="D(f)">
                <div class="D(f) Fld(c) Ai(c) Fw(b) Pend(8px) Bdendc($bd-primary-divider) Bdends(s) Bdendw(1px)">
                    <span class="Fz(16px) C($c-link-text) Mb(4px)">9.total num</span>
                    <span class="Fz(12px) C($c-icon)">10.total</span>
                </div>
                <div class="D(f) Fld(c) Ai(c) Fw(b) Px(8px) Bdendc($bd-primary-divider) Bdends(s) Bdendw(1px)">
                    <span class="Fz(16px) C($c-link-text) Mb(4px)">11.other num</span>
                    <span class="Fz(12px) C($c-icon)">12.other</span>
                </div>
                <div class="D(f) Fld(c) Ai(c) Fw(b) Pstart(8px)">
                    <span class="Fz(16px) Mb(4px) C($c-trend-up)">
                        <div class="D(f)">
                            13.(
                            <span class="D(f) Ai(c)">
                                <span class="Mend(4px) Bds(s)" style="border-color:transparent transparent #ff333a transparent;border-width:0 5px 7px 5px"></span>
                                14.100%
                            </span>
                            15.)
                        </div>
                    </span>
                    <span class="Fz(12px) C($c-icon)">16.count</span>
                </div>
            </div>
        </div>
    </div>
</div>

我的代码:

function getTextToArray()
{
  var URL = "https://tw.stock.yahoo.com/d/s/major_2330.html";
  var source = UrlFetchApp.fetch(URL);
  var html = source.getContentText();

  const $ = Cheerio.load(html, { decodeEntities: false });

  // 使用选择器来获取包含文本的元素
  var elements = $('#main-0-QuoteHeader-Proxy div').find('h1, span, a, button');

  // 创建一个空数组来存储文本
  var textArray = [];

  // 遍历每个元素并将文本添加到数组中

<details>
<summary>英文:</summary>

I want to get every text to the array. But some div name have &quot; &quot; or &quot;(&quot; in the class name that I can&#39;t use find().
So How can I get all of them in array? Thank you for any help

#html source code.

<div id="main-0-QuoteHeader-Proxy">
<div class="Mb($m-module-24)">
<div class="D(f) Ai(c) Mb(6px)">
<h1 class="C($c-link-text) Fw(b) Fz(24px) Mend(8px)">1.name</h1>
<span class="C($c-icon) Fz(24px) Mend(20px)">2.id</span>
<div class="Flxg(2)">
<a class="Td(n) Px(8px) Py(3px) Fz(12px) Fw(b) Bdrs(11px) C(#188fff) Bgc($tag-bg-blue) Bgc($tag-bg-blue-hover):h" href="/class-quote?sectorId=40&amp;exchange=TAI">3.cag</a>
</div>
<button class="Ff(buttonFont) O(n) Bd Bxsh(n) Trsdu(.3s) Whs(nw) C(#fff) Bdc($c-button) Bgc($c-button) Cur(p) D(f) Ai(c) Fx(n) Bdrs(100px) Px(15px) Py(3px) Lh(20px) Fz(14px) Fw(b) Mstart(8px) Bdw(2px)">
<svg class="Cur(p)" width="16" style="fill:#fff;stroke:#fff;stroke-width:0;vertical-align:bottom" height="16" viewBox="0 0 24 24" data-icon="star"><path d="M8.485 7.83l-6.515.21c-.887.028-1.3 1.117-.66 1.732l4.99 4.78-1.414 6.124c-.2 1.14.767 1.49 1.262 1.254l5.87-3.22 5.788 3.22c.48.228 1.464-.097 1.26-1.254l-1.33-6.124 4.962-4.78c.642-.615.228-1.704-.658-1.732l-6.486-.21-2.618-6.22c-.347-.815-1.496-.813-1.84.003L8.486 7.83zm7.06 6.05l1.11 5.11-4.63-2.576L7.33 18.99l1.177-5.103-4.088-3.91 5.41-.18 2.19-5.216 2.19 5.216 5.395.18-4.06 3.903z"></path></svg>
<span class="Mstart(8px)">4.add</span>
</button>
</div>
<div class="D(f) Jc(sb) Ai(fe)">
<div class="D(f) Fld(c) Ai(fs)">
<div class="D(f) Ai(fe) Mb(4px)">
<span class="Fz(32px) Fw(b) Lh(1) Mend(16px) D(f) Ai(c) C($c-trend-down)">5.num</span>
<span class="Fz(20px) Fw(b) Lh(1.2) Mend(4px) D(f) Ai(c) C($c-trend-down)">
<span class="Mend(4px) Bds(s)" style="border-color:#00ab5e transparent transparent transparent;border-width:9px 6.5px 0 6.5px"></span>
6.up
</span>
<span class="Jc(fe) Fz(20px) Lh(1.2) Fw(b) D(f) Ai(c) C($c-trend-down)">7.down</span>
</div>
<span class="C(#6e7780) Fz(12px) Fw(b)">8.2023/02/17 13:30</span>
</div>
<div class="D(f)">
<div class="D(f) Fld(c) Ai(c) Fw(b) Pend(8px) Bdendc($bd-primary-divider) Bdends(s) Bdendw(1px)">
<span class="Fz(16px) C($c-link-text) Mb(4px)">9.total num</span>
<span class="Fz(12px) C($c-icon)">10.total</span>
</div>
<div class="D(f) Fld(c) Ai(c) Fw(b) Px(8px) Bdendc($bd-primary-divider) Bdends(s) Bdendw(1px)">
<span class="Fz(16px) C($c-link-text) Mb(4px)">11.other num</span>
<span class="Fz(12px) C($c-icon)">12.other</span>
</div>
<div class="D(f) Fld(c) Ai(c) Fw(b) Pstart(8px)">
<span class="Fz(16px) Mb(4px) C($c-trend-up)">
<div class="D(f)">
13.(
<span class="D(f) Ai(c)">
<span class="Mend(4px) Bds(s)" style="border-color:transparent transparent #ff333a transparent;border-width:0 5px 7px 5px"></span>
14.100%
</span>
15.)
</div>
</span>
<span class="Fz(12px) C($c-icon)">16.count</span>
</div>
</div>
</div>
</div>
</div>

#my code:

function getTextToArray()
{
var URL = "https://tw.stock.yahoo.com/d/s/major_2330.html";
var source = UrlFetchApp.fetch(URL);
var html = source.getContentText();

const $ = Cheerio.load(html,{ decodeEntities: false });

Logger.log($('#main-0-QuoteHeader-Proxy').find('div').first().text());
}


this resule only can show the line &quot;1.name2.id3.cag4.add5.num6.up7.down8.2023/02/17 13:309.total num10.total11.other num12.other13.(14.100%15.)&quot;
And can&#39;t use the class name to find and will show the error &quot;Error: Unmatched selector: ($c-link-text) Fw(b) Fz(24px) Mend(8px)&quot;
I want resule can like array[0] = 1.name, array[1] = 2.id, array[2]= 3.cag.......
</details>
# 答案1
**得分**: 0
$(&#39;h1,span&#39;).get().map(el =&gt; $(el).text()):
这是一段代码,它选择页面上的所有`<h1>`和`<span>`元素,并获取它们的文本内容,然后以数组的形式返回这些文本内容。
<details>
<summary>英文:</summary>
it looks like you want:
$(&#39;h1,span&#39;).get().map(el =&gt; $(el).text())
</details>

huangapple
  • 本文由 发表于 2023年2月18日 09:41:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75490659.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定