英文:
Extracting some text in a sentence from a website in python
问题
我在尝试从这个[网站](http://wiseafrican.isslserv.ng/index.php/category/nigerian-proverbs/yoruba-proverbs/page/5/)中提取句子中的一些文本时遇到了困难。
```python
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get('http://wiseafrican.isslserv.ng/index.php/category/nigerian-proverbs/yoruba-proverbs/page/5/')
soup4 = BeautifulSoup(res.content, 'html.parser')
soup4.findAll('div', 'excerpt')
以下是输出。我想要提取每个HTML标签中**Translation:**之前的句子,然后将它们添加到一个pandas DataFrame
中。
[<div class="excerpt">
<p>A ki i fi ara eni se oogun alokunna. Translation: One does not use oneself as an ingredient in a medicine requiring that the ingredients be pulverized. Meaning; Self-preservation is a compulsory project for all.</p>
</div>, <div class="excerpt">
<p>A ki i fi ai-mo-we mookun. Translation: One does not dive under water without knowing how to swim. Meaning: Never engage in a project for which you lack the requisite skills.</p>
</div>, <div class="excerpt">
<p>A ki i fi agba sile sin agba. Translation: One does not leave one elder sitting to walk another elder part of his way. meaning: One should not slight one person in order to humor another.</p>
</div>, <div class="excerpt">
<p>A ki i fa ori lehin olori. Translation: One does not shave a head in the absence of the owner. Meaning: One does not settle a matter in the absence of the person most concerned.</p>
</div>, <div class="excerpt">
<p>A ki i duni loye ka fona ile-e Baale hanni. Translation: One does not compete with another for a chieftaincy title and also show the way to the king’s house to the competitor. Meaning: A person should be treated either as an adversary or as an ally, not as both.</p>
</div>, <div class="excerpt">
<p>A ki i du ori olori ki awodi gbe teni lo. Translation: One does not fight to save another person’s head only to have a kite carry one’s own away. Meaning: One should not save other’s at the cost of one’s own safety.</p>
</div>, <div class="excerpt">
<p>A ki i da eru ikun pa ori. Translation: One does not weigh the head down with a load that belongs to the belly. Meaning: Responsibilities should rest where they belong.</p>
</div>, <div class="excerpt">
<p>A ki i da aro nisokun ala la nlo. Translation: One does not engage in a dyeing trade in (isokun) people there wear only white. Meaning Wherever one might be, one should respect the manners and habits of the place.</p>
</div>, <div class="excerpt">
<p>A ki bo sinu omi tan ka maa sa fun otutu. Translation: Does not enter into the water and then run from the cold. Meaning: Precautions are useful only before the event.</p>
</div>, <div class="excerpt">
<p>A fun o lobe o tami si; o gbon ju olobe lo. Translation: You are given some stew and you add water; you must be wiser than the cook. Meaning: Adding water is a means of stretching stew. A person who thus stretches the stew he or she is given would seem to know better than the person who served it how much would suffice for the meal.</p>
</div>]
<details>
<summary>英文:</summary>
I was stuck while trying to extract some text in a sentence via this [website](http://wiseafrican.isslserv.ng/index.php/category/nigerian-proverbs/yoruba-proverbs/page/5/).
import pandas as pd
import requests
from b24 import BeautifulSoap
res = requests.get('http://wiseafrican.isslserv.ng/index.php/category/nigerian-proverbs/yoruba-proverbs/page/5/')
soup4 = BeautifulSoup(res4.content, 'html.parser')
soup4.findAll('div','excerpt')
Below are the output. I will like to extract only the sentence before **Translation:** in each html tag,before adding them to a `pandas DataFrame`.
[<div class="excerpt">
<p>A ki i fi ara eni se oogun alokunna. Translation: One does not use oneself as an ingredient in a medicine requiring that the ingredients be pulverized. Meaning; Self-preservation is a compulsory project for all.</p>
</div>, <div class="excerpt">
<p>A ki i fi ai-mo-we mookun. Translation: One does not dive under water without knowing how to swim. Meaning: Never engage in a project for which you lack the requisite skills.</p>
</div>, <div class="excerpt">
<p>A ki i fi agba sile sin agba. Translation: One does not leave one elder sitting to walk another elder part of his way. meaning: One should not slight one person in order to humor another.</p>
</div>, <div class="excerpt">
<p>A ki i fa ori lehin olori. Translation: One does not shave a head in the absence of the owner. Meaning: One does not settle a matter in the absence of the person most concerned.</p>
</div>, <div class="excerpt">
<p>A ki i duni loye ka fona ile-e Baale hanni. Translation: One does not compete with another for a chieftaincy title and also show the way to the king’s house to the competitor. Meaning: A person should be treated either as an adversary or as an ally, not as both.</p>
</div>, <div class="excerpt">
<p>A ki i du ori olori ki awodi gbe teni lo. Translation: One does not fight to save another person’s head only to have a kite carry one’s own away. Meaning: One should not save other’s at the cost of one’s own safety.</p>
</div>, <div class="excerpt">
<p>A ki i da eru ikun pa ori. Translation: One does not weigh the head down with a load that belongs to the belly. Meaning: Responsibilities should rest where they belong.</p>
</div>, <div class="excerpt">
<p>A ki i da aro nisokun ala la nlo. Translation: One does not engage in a dyeing trade in (isokun) people there wear only white. Meaning Wherever one might be, one should respect the manners and habits of the place.</p>
</div>, <div class="excerpt">
<p>A ki bo sinu omi tan ka maa sa fun otutu. Translation: Does not enter into the water and then run from the cold. Meaning: Precautions are useful only before the event.</p>
</div>, <div class="excerpt">
<p>A fun o lobe o tami si; o gbon ju olobe lo. Translation: You are given some stew and you add water; you must be wiser than the cook. Meaning: Adding water is a means of stretching stew. A person who thus stretches the stew he or she is given would seem to know better than the person who served it how much would suffice for the meal.</p>
</div>]
</details>
# 答案1
**得分**: 0
One solution is to add text to Dataframe and then use `.str.extract()` to clear your data:
```python
import requests
import pandas as pd
from bs4 import BeautifulSoup
res = requests.get('http://wiseafrican.isslserv.ng/index.php/category/nigerian-proverbs/yoruba-proverbs/page/5/')
soup4 = BeautifulSoup(res.content, 'html.parser')
df = pd.DataFrame([div.get_text(strip=True) for div in soup4.findAll('div','excerpt')], columns=['Proverb'])
df['Proverb'] = df['Proverb'].str.extract('^(.*)\s+Translation')
print(df)
Prints:
Proverb
0 A ki i fi ara eni se oogun alokunna.
1 A ki i fi ai-mo-we mookun.
2 A ki i fi agba sile sin agba.
3 A ki i fa ori lehin olori.
4 A ki i duni loye ka fona ile-e Baale hanni.
5 A ki i du ori olori ki awodi gbe teni lo.
6 A ki i da eru ikun pa ori.
7 A ki i da aro nisokun ala la nlo.
8 A ki  bo sinu omi tan ka maa sa fun otutu.
9 A fun o lobe o tami si; o gbon ju olobe lo.
Or use re
module before:
df = pd.DataFrame([re.sub(r'^(.*)\s+Translation:.*', r'', div.get_text(strip=True)) for div in soup4.findAll('div','excerpt')], columns=['Proverb'])
print(df)
英文:
One solution is to add text to Dataframe and then use .str.extract()
to clear your data:
import requests
import pandas as pd
from bs4 import BeautifulSoup
res = requests.get('http://wiseafrican.isslserv.ng/index.php/category/nigerian-proverbs/yoruba-proverbs/page/5/')
soup4 = BeautifulSoup(res.content, 'html.parser')
df = pd.DataFrame([div.get_text(strip=True) for div in soup4.findAll('div','excerpt')], columns=['Proverb'])
df['Proverb'] = df['Proverb'].str.extract('^(.*)\s+Translation')
print(df)
Prints:
Proverb
0 A ki i fi ara eni se oogun alokunna.
1 A ki i fi ai-mo-we mookun.
2 A ki i fi agba sile sin agba.
3 A ki i fa ori lehin olori.
4 A ki i duni loye ka fona ile-e Baale hanni.
5 A ki i du ori olori ki awodi gbe teni lo.
6 A ki i da eru ikun pa ori.
7 A ki i da aro nisokun ala la nlo.
8 A ki  bo sinu omi tan ka maa sa fun otutu.
9 A fun o lobe o tami si; o gbon ju olobe lo.
Or use re
module before:
df = pd.DataFrame([re.sub(r'^(.*)\s+Translation:.*', r'', div.get_text(strip=True)) for div in soup4.findAll('div','excerpt')], columns=['Proverb'])
print(df)
答案2
得分: 0
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get('http://wiseafrican.isslserv.ng/index.php/category/nigerian-proverbs/yoruba-proverbs/page/5/')
soup4 = BeautifulSoup(res.content, 'html.parser')
data = soup4.findAll('div', 'excerpt')
for i in data:
#print(i.p.text)
print(i.p.text.split('Translation:')[0])
英文:
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get('http://wiseafrican.isslserv.ng/index.php/category/nigerian-proverbs/yoruba-proverbs/page/5/')
soup4 = BeautifulSoup(res.content, 'html.parser')
data = soup4.findAll('div','excerpt')
for i in data:
#print(i.p.text)
print(i.p.text.split('Translation:')[0])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论