英文:
How to get span text only from a table?
问题
以下是您提供的代码的翻译部分:
在这个HTML中,我试图解析文本字段和影响,但影响不是文本,而是一张图片。
cols = [ele.text.strip() for ele in cols]
但是用 span.text 替代不起作用,我需要获取每行文本的影响,它的 span 值是
fxs_c_impact-icon fxs_c_impact-none
我试图从表格中提取所有的 span 文本。
data3 = []
table3 = soup.find('table', attrs={'class':'fxs_c_table'})
table_body3 = table3.find('tbody')
rows = table_body3.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.span.text for ele in cols]
data3.append([ele for ele in cols if ele])
这个 span 项看起来像这样
<span class="fxs_c_impact-icon fxs_c_impact-medium"></span>
我得到的错误是
AttributeError: 'NoneType' object has no attribute 'text'
如果我想从表格中提取文本字段的文本,这个脚本可以工作,但我似乎无法提取这个 span 文本值。
英文:
In this HTML I am trying to parse the text fields and the impact but impact is not text its an image
<td class="fxs_c_item fxs_c_time"><span>01:00</span></td>,
<td class="fxs_c_item fxs_c_flag"><span class="fxs_flag fxs_us" title="United States"></span></td>,
<td class="fxs_c_item fxs_c_currency"><span>USD</span></td>,
<td class="fxs_c_item fxs_c_name"><span>New Year's Day</span><span> <span></span></span></td>,
<td class="fxs_c_item fxs_c_impact"><span class="fxs_c_impact-icon fxs_c_impact-none"></span></td>,
<td class="fxs_c_item fxs_c_type" colspan="4"><span class="fxs_c_label fxs_c_label_info">All Day</span></td>,
<td class="fxs_c_item fxs_c_notify"></td>,
<td class="fxs_c_item fxs_c_dashboard" data-gtmid="features-calendar-eventdetails-eventoptions-4d3300ad-c168-4a5f-a4ac-a60a338e63c4"><span><svg aria-hidden="true" class="fxs_icon svg-inline--fa fa-ellipsis-h fa-w-16" data-icon="ellipsis-h" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg"><path d="M328 256c0 39.8-32.2 72-72 72s-72-32.2-72-72 32.2-72 72-72 72 32.2 72 72zm104-72c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72zm-352 0c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72z" fill="currentColor"></path></svg></span></td>]
I am able to get all the table
text with this line
cols = [ele.text.strip() for ele in cols]
but substituting span.text does not work I need the span
value of
fxs_c_impact-icon fxs_c_impact-none
for impact for each row of text
I am trying extract all the span
text from a table
data3 = []
table3 = soup.find('table', attrs={'class':'fxs_c_table'})
table_body3 = table3.find('tbody')
rows = table_body3.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.span.text for ele in cols]
data3.append([ele for ele in cols if ele])
The span
item looks like this
<span class="fxs_c_impact-icon fxs_c_impact-medium"></span>
Error I get
AttributeError: 'NoneType' object has no attribute 'text'
The script works if I want to extract text from text fields from the table but I cant seem to extract this span text value.
答案1
得分: 1
根据评论中提到的,尝试选择更具体的元素。
.find('span',{'class' : 'fxs_c_impact-icon'}).get('class')[-1]
因为主要问题是有一个td
没有span
:
<td class="fxs_c_item fxs_c_notify"></td>
所以ele.span
会变成None
,你不能调用.text
。
示例
from bs4 import BeautifulSoup
html = '''
<tr>
<td class="fxs_c_item fxs_c_time"><span>01:00</span></td>
<td class="fxs_c_item fxs_c_flag"><span class="fxs_flag fxs_us" title="United States"></span></td>
<td class="fxs_c_item fxs_c_currency"><span>USD</span></td>
<td class="fxs_c_item fxs_c_name"><span>New Year's Day</span><span> <span></span></span></td>
<td class="fxs_c_item fxs_c_impact"><span class="fxs_c_impact-icon fxs_c_impact-none"></span></td>
<td class="fxs_c_item fxs_c_type" colspan="4"><span class="fxs_c_label fxs_c_label_info">All Day</span></td>
<td class="fxs_c_item fxs_c_notify"></td>
<td class="fxs_c_item fxs_c_dashboard" data-gtmid="features-calendar-eventdetails-eventoptions-4d3300ad-c168-4a5f-a4ac-a60a338e63c4"><span><svg aria-hidden="true" class="fxs_icon svg-inline--fa fa-ellipsis-h fa-w-16" data-icon="ellipsis-h" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg"><path d="M328 256c0 39.8-32.2 72-72 72s-72-32.2-72-72 32.2-72 72-72 72 32.2 72 72zm104-72c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72zm-352 0c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72z" fill="currentColor"></path></svg></span></td>
</tr>
'''
soup = BeautifulSoup(html)
data = []
for e in soup.find_all('tr'):
data.append(
{
'time': e.span.text,
'title': e.find('span',{'class' : 'fxs_flag'}).get('title'),
'currency': e.find('td',{'class' : 'fxs_c_currency'}).text,
'...': '...',
'impact': e.find('span',{'class' : 'fxs_c_impact-icon'}).get('class')[-1]
}
)
data
输出
[{'time': '01:00','title': 'United States','currency': 'USD', '...':'...','impact': 'fxs_c_impact-none'}]
英文:
As mentioned in the comments, try to select the elements more specific.
.find('span',{'class' : 'fxs_c_impact-icon'}).get('class')[-1]
Because the main issue is that there is one td
that do not have a span
:
<td class="fxs_c_item fxs_c_notify"></td>
So ele.span
will become None
and you could not call .text
on it.
Example
from bs4 import BeautifulSoup
html = '''
<tr>
<td class="fxs_c_item fxs_c_time"><span>01:00</span></td>
<td class="fxs_c_item fxs_c_flag"><span class="fxs_flag fxs_us" title="United States"></span></td>
<td class="fxs_c_item fxs_c_currency"><span>USD</span></td>
<td class="fxs_c_item fxs_c_name"><span>New Year's Day</span><span> <span></span></span></td>
<td class="fxs_c_item fxs_c_impact"><span class="fxs_c_impact-icon fxs_c_impact-none"></span></td>
<td class="fxs_c_item fxs_c_type" colspan="4"><span class="fxs_c_label fxs_c_label_info">All Day</span></td>
<td class="fxs_c_item fxs_c_notify"></td>
<td class="fxs_c_item fxs_c_dashboard" data-gtmid="features-calendar-eventdetails-eventoptions-4d3300ad-c168-4a5f-a4ac-a60a338e63c4"><span><svg aria-hidden="true" class="fxs_icon svg-inline--fa fa-ellipsis-h fa-w-16" data-icon="ellipsis-h" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg"><path d="M328 256c0 39.8-32.2 72-72 72s-72-32.2-72-72 32.2-72 72-72 72 32.2 72 72zm104-72c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72zm-352 0c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72z" fill="currentColor"></path></svg></span></td>
</tr>
'''
soup = BeautifulSoup(html)
data = []
for e in soup.find_all('tr'):
data.append(
{
'time': e.span.text,
'title': e.find('span',{'class' : 'fxs_flag'}).get('title'),
'currency': e.find('td',{'class' : 'fxs_c_currency'}).text,
'...': '...',
'impact': e.find('span',{'class' : 'fxs_c_impact-icon'}).get('class')[-1]
}
)
data
Output
[{'time': '01:00','title': 'United States','currency': 'USD', '...':'...','impact': 'fxs_c_impact-none'}]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论