如何从HTML代码块中提取日期时间

huangapple go评论60阅读模式
英文:

How to extract datetime from chunk of HTML

问题

返回的翻译部分如下:

为什么它返回None?

我感到困惑,因为我不知道如何仅返回日期时间。

英文:

I have a piece of HTML that includes a datetime like this

<time datetime="2023-01-06 05:00:00" data-format="article-display" data-show-date="always" data-show-time="today-only" data-timestamp="1672981200" itemprop="datePublished" class="author-details__timestamp formatTimeStampEs6" full-date="05.01.2023">6th January</time>

I've used the copy JS from Chrome inspector and had this returned

#article > div.mar-article > div > div.mar-article__timestamp > time

def extract_time(data):
    """Extract the time from the HTML of the article page."""
    soup = BeautifulSoup(data, 'html.parser')
    # Use the select_one() method to find the time element
    time_element = soup.find("time", class_="datetime")
    print(time_element)
    return time_element

Why does it return None?

I'm confused as I don't know how to return just the datetime.

答案1

得分: 1

这个元素没有名为 datetimeclass,但您可以通过其 attribute datetime 来选择它(前提是在源代码中也存在相应的元素):

soup.select_one('time[datetime]').get('datetime')

示例

from bs4 import BeautifulSoup
soup = BeautifulSoup('<time datetime="2023-01-06 05:00:00" data-format="article-display" data-show-date="always" data-show-time="today-only" data-timestamp="1672981200" itemprop="datePublished" class="author-details__timestamp formatTimeStampEs6" full-date="05.01.2023">6th January</time>')

soup.select_one('time[datetime]').get('datetime')

输出

2023-01-06 05:00:00
英文:

The element do not have a class called datetime but you could select it by its attribute datetime (provided that the corresponding element is also present in the soup):

soup.select_one(&#39;time[datetime]&#39;).get(&#39;datetime&#39;)

Example

from bs4 import BeautifulSoup
soup = BeautifulSoup(&#39;&lt;time datetime=&quot;2023-01-06 05:00:00&quot; data-format=&quot;article-display&quot; data-show-date=&quot;always&quot; data-show-time=&quot;today-only&quot; data-timestamp=&quot;1672981200&quot; itemprop=&quot;datePublished&quot; class=&quot;author-details__timestamp formatTimeStampEs6&quot; full-date=&quot;05.01.2023&quot;&gt;6th January&lt;/time&gt;&#39;)

soup.select_one(&#39;time[datetime]&#39;).get(&#39;datetime&#39;)

Output

2023-01-06 05:00:00

huangapple
  • 本文由 发表于 2023年1月9日 01:05:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75049756.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定