如何更改具有子元素的节点的字符串内容?

huangapple go评论69阅读模式
英文:

How to change string content of a node that has child elements?

问题

我正在尝试使用BeautifulSoup在Python中创建一个脚本,其中整个页面的文本将被更改为其他内容。

到目前为止,一切都进行得很顺利,但是每当我遇到一个既有字符串又有另一个节点的节点时,就会遇到麻烦。

例如,这是一些示例HTML:

   <div>
        abc
        <p>xyz</p>
   </div>

我想要做的是更改HTML中的"abc"部分,而不影响节点的其余内容。

您可能已经知道,在BeautifulSoup中只能用element.string来处理只有一个子元素的节点,因为在这个示例中,<div>节点有两个子元素(文本和<p>标签),尝试访问字符串属性将导致运行时错误,说NoneType没有字符串属性。

在这种特定情况下,是否有一种绕过使用字符串属性并更改节点文本部分的方法?

英文:

I'm trying to make a script in Python using BeautifulSoup where the text on the whole page is going to be changed into something else.

So far it's going good, but I'm having trouble whenever I encounter a node that has both a string and another node inside it.

As an example, here is some sample HTML:

   &lt;div&gt;
        abc
        &lt;p&gt;xyz&lt;/p&gt;
   &lt;/div&gt;

What I want to do is change the "abc" part of the HTML without affecting the remaining content of the node.

As you probably already know, using element.string in BeautifulSoup only works with nodes that have one child element, and since in this example the &lt;div&gt; node has two children (text and the &lt;p&gt; tag), trying to access the string attribute is going to end with a Runtime Error, saying that NoneType has no string attribute.

Is there a way to go around using the string attribute and changing the text portion of a node in this specific scenario?

答案1

得分: 0

你可以使用.contents属性访问&lt;div&gt;标签的各种内容,然后使用.replace_with()来放置新的文本:

from bs4 import BeautifulSoup

html_doc = '''\
&lt;div&gt;
    abc
    &lt;p&gt;xyz&lt;/p&gt;
&lt;/div&gt;'''

soup = BeautifulSoup(html_doc, 'html.parser')

soup.div.contents[0].replace_with('Hello World')
print(soup)

输出:

&lt;div&gt;
    Hello World
    &lt;p&gt;xyz&lt;/p&gt;
&lt;/div&gt;
英文:

You can access various contents of the &lt;div&gt; tag with .contents property and then use .replace_with() to put new text there:

from bs4 import BeautifulSoup

html_doc = &#39;&#39;&#39;\
&lt;div&gt;
    abc
    &lt;p&gt;xyz&lt;/p&gt;
&lt;/div&gt;&#39;&#39;&#39;

soup = BeautifulSoup(html_doc, &#39;html.parser&#39;)

soup.div.contents[0].replace_with(&#39;\n    Hello World\n    &#39;)
print(soup)

Prints:

&lt;div&gt;
    Hello World
    &lt;p&gt;xyz&lt;/p&gt;
&lt;/div&gt;

huangapple
  • 本文由 发表于 2023年2月24日 08:14:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75551551.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定