英文:
Create column with year name if all following years meet condition
问题
以下是代码部分的翻译:
我有以下数据集
df = pd.DataFrame({
'UID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Year': [2015, 2016, 2017, 2014, 2015, 2017, 2014, 2015, 2016],
'Good?': [0, 1, 1, 0, 0, 1, 0, 1, 0]
})
对于每个UID,我试图找出第一个'Good?'值为1的年份,以及其后的年份值也满足'Good?'为1的条件。如果条件不满足,我想将值分配为2017。
我似乎在索引方面遇到了一些问题,因为它会引发'KeyError: #####'错误 - 我猜测有些情况下我只有一个年份值,这会导致错误。到目前为止,这是我的代码。
# 按UID对DataFrame进行分组
groups = df.groupby('UID')
# 初始化一个空列表以存储结果
results = []
# 循环遍历每个UID组
for uid, group in groups:
# 找到第一个'Good?'值为1的索引
first_good_index = group[group['Good?'] == 1].index[0]
print(first_good_index)
# 检查所有后续年份是否都有'Good?'值为1
if (group.loc[first_good_index+1:, 'Good?'] == 1).all():
# 如果是这样,将UID和第一个好行的年份附加到结果列表中
results.append((uid, group.loc[first_good_index, 'Year']))
else:
results.append((uid, 2017))
# 从结果创建一个DataFrame
results_df = pd.DataFrame(results, columns=['UID', 'First Good Year'])
# 打印结果
print(results_df)
这些是预期的结果
results_df = pd.DataFrame({
'UID': [1, 2, 3],
'First Good Year': [2016, 2017, 2017],
})
results_df
英文:
I have the following dataset
df = pd.DataFrame({
'UID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Year': [2015, 2016, 2017, 2014, 2015, 2017, 2014, 2015, 2016],
'Good?': [0, 1, 1, 0, 0, 1, 0, 1, 0]
})
for each UID, I am trying to figure out what is the first Year value whose respective 'Good?' value is 1 and also whose following Year values meet the condition 'Good?' as 1. In case the condition is not met, I would like to assign the value as 2017.
I seem to have some problems with the indexing as it throws a 'KeyError: #####' - I guess there are cases where I have only one year value and that is throwing an error. this is what I got so far.
# Group the DataFrame by UID
groups = df.groupby('UID')
# Initialize an empty list to store the results
results = []
# Loop over each UID group
for uid, group in groups:
# Find the first index with a Good value of 1
first_good_index = group[group['Good?'] == 1].index[0]
print(first_good_index)
# Check if all following years have a Good value of 1
if (group.loc[first_good_index+1:, 'Good?'] == 1).all():
# If so, append the UID and the year of the first good row to the results list
results.append((uid, group.loc[first_good_index, 'Year']))
else:
results.append((uid, 2017))
# Create a DataFrame from the results
results_df = pd.DataFrame(results, columns=['UID', 'First Good Year'])
# Print the results
print(results_df)
these are the expected results
results_df = pd.DataFrame({
'UID': [1, 2, 3],
'First Good Year': [2016, 2017, 2017],
})
results_df
答案1
得分: 1
#测试1的值
m = df['Good?'].eq(1)
#测试如果第一个1之后的所有值都不是1
mask = m.groupby(df['UID']).cummax() & ~m
#筛选只有Good列中有1的UID
df1 = df[~df['UID'].isin(df.loc[mask, 'UID']) & m]
print (df1)
UID Year Good?
1 1 2016 1
2 1 2017 1
5 2 2017 1
#获取第一个'IUD'并填充缺失的'UID'为2017
out = (df1.drop_duplicates('UID')
.set_index('UID')['Year']
.reindex(df['UID'].unique(), fill_value=2017)
.reset_index())
print (out)
UID Year
0 1 2016
1 2 2017
2 3 2017
英文:
Use:
#test 1 values
m = df['Good?'].eq(1)
#test if all values after first 1 is not 1
mask = m.groupby(df['UID']).cummax() & ~m
#filter UIDs with only 1 in Good column
df1 = df[~df['UID'].isin(df.loc[mask, 'UID']) & m]
print (df1)
UID Year Good?
1 1 2016 1
2 1 2017 1
5 2 2017 1
#get first `IUD` wth append missing `UID` filled by 2017
out = (df1.drop_duplicates('UID')
.set_index('UID')['Year']
.reindex(df['UID'].unique(), fill_value=2017)
.reset_index())
print (out)
UID Year
0 1 2016
1 2 2017
2 3 2017
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论