英文:
complicated json to df
问题
我有100个URL,当我点击它时,它会显示JSON文件。
但是JSON文件有点复杂,它看起来像这样:
{
"release": [
{
"id":"1234",
"version":"1.0",
"releaseDate":"2023-07-31",
"xxx": "ssss",
"yyy": "uuuu"
},
{
"id" :"2345",
"version": "1.1",
"releaseDate":"2023-05-12",
"xxx":"sssss"
},
...
],
"user":false
}
我想要计算过去6个月的发布数量,但是复杂的JSON使得常用的json.loads...pd.read_json...normalize...无法正常工作。
还有 "...." 实际上包含一些HTML标签,如下所示,因此最好只选择 "releaseDate" 进行过滤。
"att":"<p><em>as Alice</em> for...."
我尝试过
我可以使用这个来计算所有时间的发布数量:
releases = len(json_data['release'])
但如何限制它在过去的6个月内?
非常感谢任何帮助!
英文:
I have 100 url and when I click it, it will show json file.
But the json file is a little bit complicated, it looks like this:
{
"release": [
{
"id":"1234",
"version":"1.0",
"releaseDate":"2023-07-31",
"xxx": "ssss",
"yyy": "uuuu" }
{
"id" :"2345",
"version": "1.1"
"releaseDate":"2023-05-12"
"xxx":"sssss"
.....}
],
"user":false
}
I want to count the release for past 6 month, but the complicated json makes the popular json.loads...pd.read_json...normalize...doesnot work
also the .... actually contains some html label like below, so it will be better to just select the "releaseDate" to filter.
"att":"<p><em>as Alice</em> for.....
What I tried
I can use this to count the release for all time
releases=len(json_data['releases'])
but how can I limit it to the past 6 month?
any help is really appreciated!!
答案1
得分: 1
创建一个包含六个月前日期的字符串:
six_months_ago = "2023-02-28"
然后使用len()
与一个列表推导式,只选择那些在该日期或之后发布的项目:
releases = len([r for r in json_data["releases"] if r["releaseDate"] >= six_months_ago])
英文:
Create a string that contains the date from six months ago:
six_months_ago = "2023-02-28"
And then use len()
with a list comprehension that only chooses items that were released on or after that date:
releases = len([r for r in json_data["releases"] if r["releaseDate"] >= six_months_ago])
答案2
得分: 1
考虑这个示例:
import json
json_string = r"""{
"release": [
{
"id":"1234",
"version":"1.0",
"releaseDate":"2023-07-31",
"xxx": "ssss",
"yyy": "uuuu" },
{
"id" :"2345",
"version": "1.1",
"releaseDate":"2023-05-12",
"xxx":"sssss"},
{
"id" :"485",
"version": "1.2",
"releaseDate":"2022-05-12",
"xxx":"sssss"}
],
"user":false
}"""
data = json.loads(json_string)
df = pd.DataFrame(data["release"])
df["releaseDate"] = pd.to_datetime(df["releaseDate"], dayfirst=False)
print(df)
打印:
id version releaseDate xxx yyy
0 1234 1.0 2023-07-31 ssss uuuu
1 2345 1.1 2023-05-12 sssss NaN
2 485 1.2 2022-05-12 sssss NaN
然后,要过滤这个数据框,您可以执行以下操作:
now_minus_6_months = pd.Timestamp.now() - pd.DateOffset(months=6)
print(df[df["releaseDate"] > now_minus_6_months])
打印:
id version releaseDate xxx yyy
0 1234 1.0 2023-07-31 ssss uuuu
1 2345 1.1 2023-05-12 sssss NaN
英文:
Consider this example:
import json
json_string = r"""{
"release": [
{
"id":"1234",
"version":"1.0",
"releaseDate":"2023-07-31",
"xxx": "ssss",
"yyy": "uuuu" },
{
"id" :"2345",
"version": "1.1",
"releaseDate":"2023-05-12",
"xxx":"sssss"},
{
"id" :"485",
"version": "1.2",
"releaseDate":"2022-05-12",
"xxx":"sssss"}
],
"user":false
}"""
data = json.loads(json_string)
df = pd.DataFrame(data["release"])
df["releaseDate"] = pd.to_datetime(df["releaseDate"], dayfirst=False)
print(df)
Prints:
id version releaseDate xxx yyy
0 1234 1.0 2023-07-31 ssss uuuu
1 2345 1.1 2023-05-12 sssss NaN
2 485 1.2 2022-05-12 sssss NaN
Then to filter this dataframe you can do:
now_minus_6_months = pd.Timestamp.now() - pd.DateOffset(months=6)
print(df[df["releaseDate"] > now_minus_6_months])
Prints:
id version releaseDate xxx yyy
0 1234 1.0 2023-07-31 ssss uuuu
1 2345 1.1 2023-05-12 sssss NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论