plt.scatter图绘制为空白。

huangapple go评论100阅读模式
英文:

plt.scatter plot turns out blank

问题

以下是翻译好的部分:

# 导入库
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np

os.chdir('文件路径')

# 导入数据文件
activity = pd.read_csv('文件路径\dailyActivity_merged.csv')
intensity = pd.read_csv('文件路径\hourlyIntensities_merged.csv')
steps = pd.read_csv('文件路径\hourlySteps_merged.csv')
sleep = pd.read_csv('文件路径\sleepDay_merged.csv')

# activity 数据框中的 ActivityDate 列仅包括日期(没有时间)。将其重命名为 Dates
activity = activity.rename(columns={'ActivityDate': 'Dates'})

# intensity 数据框和 steps 数据框中的 ActivityHour 列包括日期时间。将日期时间列拆分为日期和时间,并删除日期时间列
intensity['Dates'] = pd.to_datetime(intensity['ActivityHour']).dt.date
intensity['Times'] = pd.to_datetime(intensity['ActivityHour']).dt.time
intensity = intensity.drop(columns=['ActivityHour'])

# 在 steps 数据框中将日期时间列拆分为日期和时间,并删除日期时间列
steps['Dates'] = pd.to_datetime(steps['ActivityHour']).dt.date
steps['Times'] = pd.to_datetime(steps['ActivityHour']).dt.time
steps = steps.drop(columns=['ActivityHour'])

# 在 sleep 数据框中将日期时间列拆分为日期和时间,并删除日期时间列
sleep['Dates'] = pd.to_datetime(sleep['SleepDate']).dt.date
sleep['Times'] = pd.to_datetime(sleep['SleepDate']).dt.time
sleep = sleep.drop(columns=['SleepDate', 'TotalSleepRecords'])

# 添加一个列并计算入睡前在床上的时间
sleep['time_awake_in_bed'] = sleep['TotalTimeInBed'] - sleep['TotalMinutesAsleep']

# 合并 activity 和 sleep
列名列表 = ['Id', 'Dates']
activity_sleep = sleep.merge(activity,
                on=列名列表,
                how='outer')

# 绘制每日消耗卡路里与用户入睡所需时间之间的关系的散点图
plt.scatter(activity_sleep['time_awake_in_bed'], activity_sleep['Calories'], s=20, c='b', marker='o')
plt.axis([0, 200, 0, 5000])
plt.show()

注意:最大的卡路里值为4900,最小的卡路里值为0。最大的入睡前在床上的时间为0,最小的入睡前在床上的时间为150。

请告诉我如何从这段代码中获得一个散点图。非常感谢您提前的任何帮助。相同的变量在 R 中的同一数据框中使用 geom_point() 可以正常工作。

英文:

the code below returns a blank plot in Python:

# import libraries
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
os.chdir('file path')
# import data files
activity = pd.read_csv('file path\dailyActivity_merged.csv')
intensity = pd.read_csv('file path\hourlyIntensities_merged.csv')
steps = pd.read_csv('file path\hourlySteps_merged.csv')
sleep = pd.read_csv('file path\sleepDay_merged.csv')
# ActivityDate in activity df only includes dates (no time). Rename it Dates
activity = activity.rename(columns={'ActivityDate': 'Dates'})
# ActivityHour in intensity df and steps df includes date-time. Split date-time column into dates and times in intensity. Drop the date-time column
intensity['Dates'] = pd.to_datetime(intensity['ActivityHour']).dt.date
intensity['Times'] = pd.to_datetime(intensity['ActivityHour']).dt.time
intensity = intensity.drop(columns=['ActivityHour'])
# split date-time column into dates and times in steps. Drop the date-time column
steps['Dates'] = pd.to_datetime(steps['ActivityHour']).dt.date
steps['Times'] = pd.to_datetime(steps['ActivityHour']).dt.time
steps = steps.drop(columns=['ActivityHour'])
# split date-time column into dates and times in sleep. Drop the date-time column
sleep['Dates'] = pd.to_datetime(sleep['SleepDate']).dt.date
sleep['Times'] = pd.to_datetime(sleep['SleepDate']).dt.time
sleep = sleep.drop(columns=['SleepDate', 'TotalSleepRecords'])
# add a column & calculate time_awake_in_bed before falling asleep
sleep['time_awake_in_bed'] = sleep['TotalTimeInBed'] - sleep['TotalMinutesAsleep']
# merge activity and sleep
list = ['Id', 'Dates']
activity_sleep = sleep.merge(activity,
on = list,
how = 'outer')
# plot relation between calories used daily vs how long it takes users to fall asleep
plt.scatter(activity_sleep['time_awake_in_bed'], activity_sleep['Calories'], s=20, c='b', marker='o')
plt.axis([0, 200, 0, 5000])
plt.show()

NOTE: max(Calories) = 4900 and min(Calories) =0. max(time_awake_in_bed) = 0 and min(time_awake_in_bed) = 150

Please let me know how I can get a scatter plot out of this. Thank you in advance for any help.

The same variables from the same data-frame work perfectly with geom_point() in R.

答案1

得分: 1

我找到了问题所在。正如@Redox和@cheersmate在评论中提到的,我通过合并创建的数据框包含NaN值。我通过只在'Id'上合并它们来解决了这个问题。然后我可以创建一个散点图:

list = ['Id']
activity_sleep = sleep.merge(activity,
				on = list,
				how = 'outer')

"Dates"列不适合作为合并的依据,因为在每个数据框中相同的日期在多行中重复出现。另外我注意到,无论我是使用外连接还是内连接,我得到的图形都是相同的。谢谢。

英文:

I found where the problem was. As @Redox and @cheersmate mentioned in comments, the data-frame that I created by merging included NaN values. I fixed this by merging them only on 'Id'. Then I could create a scatter plot:

list = ['Id']
activity_sleep = sleep.merge(activity,
on = list,
how = 'outer')

The column "Dates" is not a good one to merge on, as in each data frame the same dates are repeated in multiple rows. Also I noticed that I get the same plot whether I outer or inner merge. Thank you.

huangapple
  • 本文由 发表于 2023年1月9日 11:57:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75053057.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定