获取表格中的文本内容。

huangapple go评论84阅读模式
英文:

How to get span text only from a table?

问题

以下是您提供的代码的翻译部分:

在这个HTML中我试图解析文本字段和影响但影响不是文本而是一张图片

cols = [ele.text.strip() for ele in cols]

但是用 span.text 替代不起作用我需要获取每行文本的影响它的 span 值是

fxs_c_impact-icon fxs_c_impact-none

我试图从表格中提取所有的 span 文本

data3 = []
table3 = soup.find('table', attrs={'class':'fxs_c_table'})
table_body3 = table3.find('tbody')

rows = table_body3.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.span.text for ele in cols]
    data3.append([ele for ele in cols if ele])

这个 span 项看起来像这样

<span class="fxs_c_impact-icon fxs_c_impact-medium"></span>

我得到的错误是

AttributeError: 'NoneType' object has no attribute 'text'

如果我想从表格中提取文本字段的文本这个脚本可以工作但我似乎无法提取这个 span 文本值
英文:

In this HTML I am trying to parse the text fields and the impact but impact is not text its an image

&lt;td class=&quot;fxs_c_item fxs_c_time&quot;&gt;&lt;span&gt;01:00&lt;/span&gt;&lt;/td&gt;,
 &lt;td class=&quot;fxs_c_item fxs_c_flag&quot;&gt;&lt;span class=&quot;fxs_flag fxs_us&quot; title=&quot;United States&quot;&gt;&lt;/span&gt;&lt;/td&gt;,
 &lt;td class=&quot;fxs_c_item fxs_c_currency&quot;&gt;&lt;span&gt;USD&lt;/span&gt;&lt;/td&gt;,
 &lt;td class=&quot;fxs_c_item fxs_c_name&quot;&gt;&lt;span&gt;New Year&#39;s Day&lt;/span&gt;&lt;span&gt; &lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;,
 &lt;td class=&quot;fxs_c_item fxs_c_impact&quot;&gt;&lt;span class=&quot;fxs_c_impact-icon fxs_c_impact-none&quot;&gt;&lt;/span&gt;&lt;/td&gt;,
 &lt;td class=&quot;fxs_c_item fxs_c_type&quot; colspan=&quot;4&quot;&gt;&lt;span class=&quot;fxs_c_label fxs_c_label_info&quot;&gt;All Day&lt;/span&gt;&lt;/td&gt;,
 &lt;td class=&quot;fxs_c_item fxs_c_notify&quot;&gt;&lt;/td&gt;,
 &lt;td class=&quot;fxs_c_item fxs_c_dashboard&quot; data-gtmid=&quot;features-calendar-eventdetails-eventoptions-4d3300ad-c168-4a5f-a4ac-a60a338e63c4&quot;&gt;&lt;span&gt;&lt;svg aria-hidden=&quot;true&quot; class=&quot;fxs_icon svg-inline--fa fa-ellipsis-h fa-w-16&quot; data-icon=&quot;ellipsis-h&quot; data-prefix=&quot;fas&quot; focusable=&quot;false&quot; role=&quot;img&quot; viewbox=&quot;0 0 512 512&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot;&gt;&lt;path d=&quot;M328 256c0 39.8-32.2 72-72 72s-72-32.2-72-72 32.2-72 72-72 72 32.2 72 72zm104-72c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72zm-352 0c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72z&quot; fill=&quot;currentColor&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/td&gt;]

I am able to get all the table text with this line

cols = [ele.text.strip() for ele in cols]

but substituting span.text does not work I need the span value of

fxs_c_impact-icon fxs_c_impact-none

for impact for each row of text

I am trying extract all the span text from a table

data3 = []
table3 = soup.find(&#39;table&#39;, attrs={&#39;class&#39;:&#39;fxs_c_table&#39;})
table_body3 = table3.find(&#39;tbody&#39;)

rows = table_body3.find_all(&#39;tr&#39;)
for row in rows:
    cols = row.find_all(&#39;td&#39;)
    cols = [ele.span.text for ele in cols]
    data3.append([ele for ele in cols if ele])

The span item looks like this

&lt;span class=&quot;fxs_c_impact-icon fxs_c_impact-medium&quot;&gt;&lt;/span&gt;

Error I get

AttributeError: &#39;NoneType&#39; object has no attribute &#39;text&#39;

The script works if I want to extract text from text fields from the table but I cant seem to extract this span text value.

答案1

得分: 1

根据评论中提到的,尝试选择更具体的元素。

.find('span',{'class' : 'fxs_c_impact-icon'}).get('class')[-1]

因为主要问题是有一个td没有span

&lt;td class=&quot;fxs_c_item fxs_c_notify&quot;&gt;&lt;/td&gt;

所以ele.span会变成None,你不能调用.text

示例
from bs4 import BeautifulSoup
html = '''
&lt;tr&gt;
&lt;td class=&quot;fxs_c_item fxs_c_time&quot;&gt;&lt;span&gt;01:00&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_flag&quot;&gt;&lt;span class=&quot;fxs_flag fxs_us&quot; title=&quot;United States&quot;&gt;&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_currency&quot;&gt;&lt;span&gt;USD&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_name&quot;&gt;&lt;span&gt;New Year&#39;s Day&lt;/span&gt;&lt;span&gt; &lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_impact&quot;&gt;&lt;span class=&quot;fxs_c_impact-icon fxs_c_impact-none&quot;&gt;&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_type&quot; colspan=&quot;4&quot;&gt;&lt;span class=&quot;fxs_c_label fxs_c_label_info&quot;&gt;All Day&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_notify&quot;&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_dashboard&quot; data-gtmid=&quot;features-calendar-eventdetails-eventoptions-4d3300ad-c168-4a5f-a4ac-a60a338e63c4&quot;&gt;&lt;span&gt;&lt;svg aria-hidden=&quot;true&quot; class=&quot;fxs_icon svg-inline--fa fa-ellipsis-h fa-w-16&quot; data-icon=&quot;ellipsis-h&quot; data-prefix=&quot;fas&quot; focusable=&quot;false&quot; role=&quot;img&quot; viewbox=&quot;0 0 512 512&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot;&gt;&lt;path d=&quot;M328 256c0 39.8-32.2 72-72 72s-72-32.2-72-72 32.2-72 72-72 72 32.2 72 72zm104-72c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72zm-352 0c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72z&quot; fill=&quot;currentColor&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
'''
soup = BeautifulSoup(html)

data = []
    
for e in soup.find_all('tr'):
    data.append(
        {
            'time': e.span.text,
            'title':  e.find('span',{'class' : 'fxs_flag'}).get('title'),
            'currency': e.find('td',{'class' : 'fxs_c_currency'}).text,
            '...': '...',
            'impact': e.find('span',{'class' : 'fxs_c_impact-icon'}).get('class')[-1]
        }
    )

data
输出
[{'time': '01:00','title': 'United States','currency': 'USD', '...':'...','impact': 'fxs_c_impact-none'}]
英文:

As mentioned in the comments, try to select the elements more specific.

.find(&#39;span&#39;,{&#39;class&#39; : &#39;fxs_c_impact-icon&#39;}).get(&#39;class&#39;)[-1]

Because the main issue is that there is one td that do not have a span:

&lt;td class=&quot;fxs_c_item fxs_c_notify&quot;&gt;&lt;/td&gt;

So ele.span will become None and you could not call .text on it.

Example
from bs4 import BeautifulSoup
html = &#39;&#39;&#39;
&lt;tr&gt;
&lt;td class=&quot;fxs_c_item fxs_c_time&quot;&gt;&lt;span&gt;01:00&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_flag&quot;&gt;&lt;span class=&quot;fxs_flag fxs_us&quot; title=&quot;United States&quot;&gt;&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_currency&quot;&gt;&lt;span&gt;USD&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_name&quot;&gt;&lt;span&gt;New Year&#39;s Day&lt;/span&gt;&lt;span&gt; &lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_impact&quot;&gt;&lt;span class=&quot;fxs_c_impact-icon fxs_c_impact-none&quot;&gt;&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_type&quot; colspan=&quot;4&quot;&gt;&lt;span class=&quot;fxs_c_label fxs_c_label_info&quot;&gt;All Day&lt;/span&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_notify&quot;&gt;&lt;/td&gt;
 &lt;td class=&quot;fxs_c_item fxs_c_dashboard&quot; data-gtmid=&quot;features-calendar-eventdetails-eventoptions-4d3300ad-c168-4a5f-a4ac-a60a338e63c4&quot;&gt;&lt;span&gt;&lt;svg aria-hidden=&quot;true&quot; class=&quot;fxs_icon svg-inline--fa fa-ellipsis-h fa-w-16&quot; data-icon=&quot;ellipsis-h&quot; data-prefix=&quot;fas&quot; focusable=&quot;false&quot; role=&quot;img&quot; viewbox=&quot;0 0 512 512&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot;&gt;&lt;path d=&quot;M328 256c0 39.8-32.2 72-72 72s-72-32.2-72-72 32.2-72 72-72 72 32.2 72 72zm104-72c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72zm-352 0c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72z&quot; fill=&quot;currentColor&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&#39;&#39;&#39;
soup = BeautifulSoup(html)

data = []
    
for e in soup.find_all(&#39;tr&#39;):
    data.append(
        {
            &#39;time&#39;: e.span.text,
            &#39;title&#39;:  e.find(&#39;span&#39;,{&#39;class&#39; : &#39;fxs_flag&#39;}).get(&#39;title&#39;),
            &#39;currency&#39;: e.find(&#39;td&#39;,{&#39;class&#39; : &#39;fxs_c_currency&#39;}).text,
            &#39;...&#39;: &#39;...&#39;,
            &#39;impact&#39;: e.find(&#39;span&#39;,{&#39;class&#39; : &#39;fxs_c_impact-icon&#39;}).get(&#39;class&#39;)[-1]
        }
    )

data
Output
[{&#39;time&#39;: &#39;01:00&#39;,&#39;title&#39;: &#39;United States&#39;,&#39;currency&#39;: &#39;USD&#39;, &#39;...&#39;:&#39;...&#39;,&#39;impact&#39;: &#39;fxs_c_impact-none&#39;}]

huangapple
  • 本文由 发表于 2023年6月11日 22:08:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76450870.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定