2023年3月15日 21:35:49go评论78阅读模式

英文:

How to remove text between () in a python list?

问题

I am still quite unexperienced and hope you can help me. I have tried a lot of different approaches in order to remove some text between () in a python3 list, but I haven't been successful so far (tried e.g. regex, split, join replace, remove, lambda etc.).

#!/usr/bin/python3

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.iban.com/country-codes')
soup = BeautifulSoup(data.text, "lxml")

output = [[cell.get_text(strip=True) for cell in row.find_all('td')[0:2]]
          for row in soup.find_all("tr")]

for i in range(1, len(output)):
    print('Name of country "' + (output[i][0]) + '"')
    print('Country code"' + (output[i][1]) + '"')
    print('')

This code produces the following output

Name of country "United States of America (the)"
Country code"US"

Name of country "Venezuela (Bolivarian Republic of)"
Country code"VE"

But I would like to remove all text between (), e.g. the "(the)" and "(Bolivarian Republic of)" in the above example output, so output would be

Name of country "United States of America"
Country code"US"

Name of country "Venezuela"
Country code"VE"

I am very sorry if this is an answer which has been answered before, but I haven't been able to find the answer to my question and I have been searching a lot before I gave up

英文:

#!/usr/bin/python3

import requests
from bs4 import BeautifulSoup

data = requests.get(&#39;https://www.iban.com/country-codes&#39;)
soup = BeautifulSoup(data.text, &quot;lxml&quot;)


output = [[cell.get_text(strip=True) for cell in row.find_all(&#39;td&#39;)[0:2]]
        for row in soup.find_all(&quot;tr&quot;)]

for i in range(1, len(output)):
	print(&#39;Name of country &quot;&#39; + (output[i][0]) + &#39;&quot;&#39;)
	print(&#39;Country code&quot;&#39; + (output[i][1]) + &#39;&quot;&#39;)
	print(&#39;&#39;)

This code produces the following output

Name of country &quot;United States of America (the)&quot;
Country code&quot;US&quot;

Name of country &quot;Venezuela (Bolivarian Republic of)&quot;
Country code&quot;VE&quot;

But I would like to remove all text between (), e.g. the "(the)" and "(Bolivarian Republic of)" in the above example output, so output would be

Name of country &quot;United States of America&quot;
Country code&quot;US&quot;
    
Name of country &quot;Venezuela&quot;
Country code&quot;VE&quot;

I am very sorry if this is an answer which has been answered before, but I haven't been able to finde the answer to my question and I have been searching a lot before I gave up

答案1

得分: 1

你可以简单地使用 .split() 通过 ( 进行分割，并从结果中提取第一个元素：

[e.split(' (')[0] for e in row.stripped_strings]

示例

不仅仅打印结果，还可以将其转换并以更有结构的方式存储，以便轻松转换为其他格式。

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.iban.com/country-codes')
soup = BeautifulSoup(data.text, "lxml")

d = []
for row in soup.select('table tbody tr'):
    d.append(
        dict(
            zip(
                soup.select_one('table tr:has(th)').stripped_strings,
                [e.split(' (')[0] for e in row.stripped_strings]
            )
        )
    )
print(d)

输出结果（美国）

{'Country': 'United States of America', 'Alpha-2 code': 'US', 'Alpha-3 code': 'USA', 'Numeric': '840'}

英文:

You could simply use .split() to split by ( and pick the first element from result:

[e.split(&#39; (&#39;)[0] for e in row.stripped_strings]

Example

Instead of just printing the results, convert and store them in a more structured way, that could be easy transformed to other formats.

import requests
from bs4 import BeautifulSoup

data = requests.get(&#39;https://www.iban.com/country-codes&#39;)
soup = BeautifulSoup(data.text, &quot;lxml&quot;)

d = []
for row in soup.select(&#39;table tbody tr&#39;):
    d .append(
        dict(
            zip(
                soup.select_one(&#39;table tr:has(th)&#39;).stripped_strings,
                [e.split(&#39; (&#39;)[0] for e in row.stripped_strings]
            )
        )
    )
print(d)

Output for US

{&#39;Country&#39;: &#39;United States of America&#39;, &#39;Alpha-2 code&#39;: &#39;US&#39;, &#39;Alpha-3 code&#39;: &#39;USA&#39;, &#39;Numeric&#39;: &#39;840&#39;}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python列表中删除括号内的文本？

问题

答案1

示例

输出结果（美国）

Example

Output for US

Python in VSCode works on file in "old" folder after moving it to sub-folder

无法使用Selenium CSS选择器找到元素，即使单独使用它正常。

如何在pandas中防止条形图相互叠加？

数据预处理阶段在机器学习中的正确顺序是什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论