问题

以下是代码部分的翻译：

我有以下数据集

df = pd.DataFrame({
    'UID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'Year': [2015, 2016, 2017, 2014, 2015, 2017, 2014, 2015, 2016],
    'Good?': [0, 1, 1, 0, 0, 1, 0, 1, 0]
})


对于每个UID，我试图找出第一个'Good?'值为1的年份，以及其后的年份值也满足'Good?'为1的条件。如果条件不满足，我想将值分配为2017。

我似乎在索引方面遇到了一些问题，因为它会引发'KeyError: #####'错误 - 我猜测有些情况下我只有一个年份值，这会导致错误。到目前为止，这是我的代码。

# 按UID对DataFrame进行分组
groups = df.groupby('UID')

# 初始化一个空列表以存储结果
results = []

# 循环遍历每个UID组
for uid, group in groups:
    # 找到第一个'Good?'值为1的索引
    first_good_index = group[group['Good?'] == 1].index[0]
    print(first_good_index)

    # 检查所有后续年份是否都有'Good?'值为1
    if (group.loc[first_good_index+1:, 'Good?'] == 1).all():
        # 如果是这样，将UID和第一个好行的年份附加到结果列表中
        results.append((uid, group.loc[first_good_index, 'Year']))
    else:
        results.append((uid, 2017))

# 从结果创建一个DataFrame
results_df = pd.DataFrame(results, columns=['UID', 'First Good Year'])

# 打印结果
print(results_df)

这些是预期的结果

results_df = pd.DataFrame({
    'UID': [1, 2, 3],
    'First Good Year': [2016, 2017, 2017],
})

results_df

英文:

I have the following dataset

df = pd.DataFrame({
    &#39;UID&#39;: [1, 1, 1, 2, 2, 2, 3, 3, 3],
    &#39;Year&#39;: [2015, 2016, 2017, 2014, 2015, 2017, 2014, 2015, 2016],
    &#39;Good?&#39;: [0, 1, 1, 0, 0, 1, 0, 1, 0]
})

for each UID, I am trying to figure out what is the first Year value whose respective 'Good?' value is 1 and also whose following Year values meet the condition 'Good?' as 1. In case the condition is not met, I would like to assign the value as 2017.

I seem to have some problems with the indexing as it throws a 'KeyError: #####' - I guess there are cases where I have only one year value and that is throwing an error. this is what I got so far.

# Group the DataFrame by UID
groups = df.groupby(&#39;UID&#39;)

# Initialize an empty list to store the results
results = []

# Loop over each UID group
for uid, group in groups:
    # Find the first index with a Good value of 1
    first_good_index = group[group[&#39;Good?&#39;] == 1].index[0]
    print(first_good_index)
    
    # Check if all following years have a Good value of 1
    if (group.loc[first_good_index+1:, &#39;Good?&#39;] == 1).all():
        # If so, append the UID and the year of the first good row to the results list
        results.append((uid, group.loc[first_good_index, &#39;Year&#39;]))
    else:
        results.append((uid, 2017))

    
# Create a DataFrame from the results
results_df = pd.DataFrame(results, columns=[&#39;UID&#39;, &#39;First Good Year&#39;])

# Print the results
print(results_df)

these are the expected results

results_df = pd.DataFrame({
    &#39;UID&#39;: [1, 2, 3],
    &#39;First Good Year&#39;: [2016, 2017, 2017],
})

results_df

答案1

得分: 1

#测试1的值
m = df['Good?'].eq(1)

#测试如果第一个1之后的所有值都不是1
mask = m.groupby(df['UID']).cummax() & ~m

#筛选只有Good列中有1的UID
df1 = df[~df['UID'].isin(df.loc[mask, 'UID']) & m]
print (df1)
   UID  Year  Good?
1    1  2016      1
2    1  2017      1
5    2  2017      1

#获取第一个'IUD'并填充缺失的'UID'为2017
out = (df1.drop_duplicates('UID')
          .set_index('UID')['Year']
          .reindex(df['UID'].unique(), fill_value=2017)
          .reset_index())
print (out)
   UID  Year
0    1  2016
1    2  2017
2    3  2017

英文:

Use:

#test 1 values
m = df[&#39;Good?&#39;].eq(1)

#test if all values after first 1 is not 1
mask = m.groupby(df[&#39;UID&#39;]).cummax() &amp; ~m

#filter UIDs with only 1 in Good column
df1 = df[~df[&#39;UID&#39;].isin(df.loc[mask, &#39;UID&#39;]) &amp; m]
print (df1)
   UID  Year  Good?
1    1  2016      1
2    1  2017      1
5    2  2017      1

#get first `IUD` wth append missing `UID` filled by 2017
out = (df1.drop_duplicates(&#39;UID&#39;)
          .set_index(&#39;UID&#39;)[&#39;Year&#39;]
          .reindex(df[&#39;UID&#39;].unique(), fill_value=2017)
          .reset_index())
print (out)
   UID  Year
0    1  2016
1    2  2017
2    3  2017

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建列，如果所有后续年份都满足条件，则使用年份名称。

问题

答案1

How to iterate over a slice and build one string from the output\

如何优化处理大型数据框的pandas iterrows。

基于 Pandas 中的其他列的条件。

如何在pandas中将列的数据类型从object更改为日期/时间

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论