英文:
Python read html table from confluence and print each row as list
问题
我想解析Confluence页面,读取表格并为每一行创建列表。
我的表格如下:
我的代码如下:
x = confluence.get_page_by_id(p_id, expand="body.storage")
soup = BeautifulSoup(x["body"]["storage"]["value"], 'html.parser')
for tables in soup.select("table tr"):
data = [item.get_text() for item in tables.select("td")]
print(data)
但问题是,由于新行的原因,代码的输出中第二列的格式如下:
['Karnataka', 'Bangalore', 'BangaloreMysoreTumkur']
而我希望输出的格式如下:
['Karnataka', 'Bangalore', 'Bangalore Mysore Tumkur']
请问你是否可以提供修复此问题的代码?
感谢您的帮助!
英文:
I'd like to parse confuence page ,read table and create list for each row.
My Table looks like
My code
x = confluence.get_page_by_id(p_id,expand="body.storage")
soup = BeautifulSoup(x["body"]["storage"]["value"], 'html.parser')
for tables in soup.select("table tr"):
data = [item.get_text() for item in tables.select("td")]
print(data)
But problem is, second column becuase of the new lines output of the code
['Karnataka','Bangalore','BangaloreMysoreTumkur']
And I want the output ot look like
['Karnataka','Bangalore','Bangalore Mysore Tumkur']
Can you please provide the code to fix this.
Thanks for the help!
答案1
得分: 1
因为缺少HTML示例作为文本,我不清楚内容,但你可以尝试为.get_text()
设置连接参数:
item.get_text(' ')
英文:
Because of missing HTML example as text, I am not aware of the contents, but you could try to set join parmeter for .get_text()
:
item.get_text(' ')
答案2
得分: 1
BeautifulSoup会去除渲染后HTML中的空白,要使用自定义分隔符,请使用以下代码:
data = [item.get_text(separator=" ") for item in tables.select("td")]
英文:
BeautifulSoup removes the whitespace in rendered HTML, to use a custom separator use this:
data = [item.get_text(separator=" ") for item in tables.select("td")]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论