英文:
How to change string content of a node that has child elements?
问题
我正在尝试使用BeautifulSoup在Python中创建一个脚本,其中整个页面的文本将被更改为其他内容。
到目前为止,一切都进行得很顺利,但是每当我遇到一个既有字符串又有另一个节点的节点时,就会遇到麻烦。
例如,这是一些示例HTML:
<div>
abc
<p>xyz</p>
</div>
我想要做的是更改HTML中的"abc"部分,而不影响节点的其余内容。
您可能已经知道,在BeautifulSoup中只能用element.string
来处理只有一个子元素的节点,因为在这个示例中,<div>
节点有两个子元素(文本和<p>
标签),尝试访问字符串属性将导致运行时错误,说NoneType没有字符串属性。
在这种特定情况下,是否有一种绕过使用字符串属性并更改节点文本部分的方法?
英文:
I'm trying to make a script in Python using BeautifulSoup where the text on the whole page is going to be changed into something else.
So far it's going good, but I'm having trouble whenever I encounter a node that has both a string and another node inside it.
As an example, here is some sample HTML:
<div>
abc
<p>xyz</p>
</div>
What I want to do is change the "abc" part of the HTML without affecting the remaining content of the node.
As you probably already know, using element.string
in BeautifulSoup only works with nodes that have one child element, and since in this example the <div>
node has two children (text and the <p>
tag), trying to access the string attribute is going to end with a Runtime Error, saying that NoneType has no string attribute.
Is there a way to go around using the string attribute and changing the text portion of a node in this specific scenario?
答案1
得分: 0
你可以使用.contents
属性访问<div>
标签的各种内容,然后使用.replace_with()
来放置新的文本:
from bs4 import BeautifulSoup
html_doc = '''\
<div>
abc
<p>xyz</p>
</div>'''
soup = BeautifulSoup(html_doc, 'html.parser')
soup.div.contents[0].replace_with('Hello World')
print(soup)
输出:
<div>
Hello World
<p>xyz</p>
</div>
英文:
You can access various contents of the <div>
tag with .contents
property and then use .replace_with()
to put new text there:
from bs4 import BeautifulSoup
html_doc = '''\
<div>
abc
<p>xyz</p>
</div>'''
soup = BeautifulSoup(html_doc, 'html.parser')
soup.div.contents[0].replace_with('\n Hello World\n ')
print(soup)
Prints:
<div>
Hello World
<p>xyz</p>
</div>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论