2023年2月6日 18:08:13go评论163阅读模式

英文:

Python read html table from confluence and print each row as list

问题

我想解析Confluence页面，读取表格并为每一行创建列表。

我的表格如下：

我的代码如下：

x = confluence.get_page_by_id(p_id, expand="body.storage")

soup = BeautifulSoup(x["body"]["storage"]["value"], 'html.parser')

for tables in soup.select("table tr"):
    data = [item.get_text() for item in tables.select("td")]
    print(data)

但问题是，由于新行的原因，代码的输出中第二列的格式如下：

['Karnataka', 'Bangalore', 'BangaloreMysoreTumkur']

而我希望输出的格式如下：

['Karnataka', 'Bangalore', 'Bangalore Mysore Tumkur']

请问你是否可以提供修复此问题的代码？

感谢您的帮助！

英文:

I'd like to parse confuence page ,read table and create list for each row.

My Table looks like

My code

x = confluence.get_page_by_id(p_id,expand=&quot;body.storage&quot;)

soup = BeautifulSoup(x[&quot;body&quot;][&quot;storage&quot;][&quot;value&quot;], &#39;html.parser&#39;)

for tables in soup.select(&quot;table tr&quot;):
    data = [item.get_text() for item in tables.select(&quot;td&quot;)]
    print(data)

But problem is, second column becuase of the new lines output of the code

[&#39;Karnataka&#39;,&#39;Bangalore&#39;,&#39;BangaloreMysoreTumkur&#39;]

And I want the output ot look like

[&#39;Karnataka&#39;,&#39;Bangalore&#39;,&#39;Bangalore Mysore Tumkur&#39;]

Can you please provide the code to fix this.

Thanks for the help!

答案1

得分: 1

因为缺少HTML示例作为文本，我不清楚内容，但你可以尝试为.get_text()设置连接参数：

item.get_text(' ')

英文:

Because of missing HTML example as text, I am not aware of the contents, but you could try to set join parmeter for .get_text():

item.get_text(&#39; &#39;)

答案2

得分: 1

BeautifulSoup会去除渲染后HTML中的空白，要使用自定义分隔符，请使用以下代码：

data = [item.get_text(separator=" ") for item in tables.select("td")]

英文:

BeautifulSoup removes the whitespace in rendered HTML, to use a custom separator use this:

data = [item.get_text(separator=&quot; &quot;) for item in tables.select(&quot;td&quot;)]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python 从 Confluence 读取 HTML 表格，并将每一行打印为列表。

问题

答案1

答案2

如何从图表中网页抓取数据

基于日期时间进行插值。

使用Python的requests库登录Reddit。

Django: Rest Framework (序列化器)

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论