Python 从 Confluence 读取 HTML 表格,并将每一行打印为列表。

huangapple go评论62阅读模式
英文:

Python read html table from confluence and print each row as list

问题

我想解析Confluence页面,读取表格并为每一行创建列表。

我的表格如下:

Python 从 Confluence 读取 HTML 表格,并将每一行打印为列表。

我的代码如下:

x = confluence.get_page_by_id(p_id, expand="body.storage")

soup = BeautifulSoup(x["body"]["storage"]["value"], 'html.parser')

for tables in soup.select("table tr"):
    data = [item.get_text() for item in tables.select("td")]
    print(data)

但问题是,由于新行的原因,代码的输出中第二列的格式如下:

['Karnataka', 'Bangalore', 'BangaloreMysoreTumkur']

而我希望输出的格式如下:

['Karnataka', 'Bangalore', 'Bangalore Mysore Tumkur']

请问你是否可以提供修复此问题的代码?

感谢您的帮助!

英文:

I'd like to parse confuence page ,read table and create list for each row.

My Table looks like

Python 从 Confluence 读取 HTML 表格,并将每一行打印为列表。

My code

x = confluence.get_page_by_id(p_id,expand="body.storage")

soup = BeautifulSoup(x["body"]["storage"]["value"], 'html.parser')

for tables in soup.select("table tr"):
    data = [item.get_text() for item in tables.select("td")]
    print(data)

But problem is, second column becuase of the new lines output of the code

['Karnataka','Bangalore','BangaloreMysoreTumkur']

And I want the output ot look like

['Karnataka','Bangalore','Bangalore Mysore Tumkur']

Can you please provide the code to fix this.

Thanks for the help!

答案1

得分: 1

因为缺少HTML示例作为文本,我不清楚内容,但你可以尝试为.get_text()设置连接参数:

item.get_text(' ')
英文:

Because of missing HTML example as text, I am not aware of the contents, but you could try to set join parmeter for .get_text():

item.get_text(' ')

答案2

得分: 1

BeautifulSoup会去除渲染后HTML中的空白,要使用自定义分隔符,请使用以下代码:

data = [item.get_text(separator=" ") for item in tables.select("td")]
英文:

BeautifulSoup removes the whitespace in rendered HTML, to use a custom separator use this:

data = [item.get_text(separator=" ") for item in tables.select("td")]

huangapple
  • 本文由 发表于 2023年2月6日 18:08:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/75359880.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定