英文:
groupby cumsum (or cumcount) with cyclical data
问题
| ID | SWITCH | Cum. Count |
|---|---|---|
| A | ON | 1 |
| A | ON | 2 |
| A | ON | 3 |
| A | OFF | 1 |
| A | OFF | 2 |
| A | OFF | 3 |
| A | ON | 1 |
| A | ON | 2 |
| A | ON | 3 |
| ... | ... | |
| B | ON | 1 |
| B | ON | 2 |
| B | OFF | 1 |
| B | OFF | 2 |
| B | OFF | 3 |
| B | ON | 1 |
| B | ON | 2 |
| B | ON | 3 |
英文:
I have a dataframe that looks like
| ID | SWITCH |
|---|---|
| A | ON |
| A | ON |
| A | ON |
| A | OFF |
| A | OFF |
| A | OFF |
| A | ON |
| A | ON |
| A | ON |
| ... | ... |
| B | ON |
| B | ON |
| B | ON |
| B | OFF |
| B | OFF |
| B | OFF |
| B | ON |
| B | ON |
| B | ON |
Column 'SWITCH' is cyclical data and I'd like to count the number of ON and OFF for each cycle like this:
| ID | SWITCH | Cum. Count |
|---|---|---|
| A | ON | 1 |
| A | ON | 2 |
| A | ON | 3 |
| A | OFF | 1 |
| A | OFF | 2 |
| A | OFF | 3 |
| A | ON | 1 |
| A | ON | 2 |
| A | ON | 3 |
| ... | ... | |
| B | ON | 1 |
| B | ON | 2 |
| B | OFF | 1 |
| B | OFF | 2 |
| B | OFF | 3 |
| B | ON | 1 |
| B | ON | 2 |
| B | ON | 3 |
I'd tried cumsum or cumcount but it didn't reset the count when the next 'ON' cycle has come (it keeps counting on the number from the previous cycle).
What can I do?
答案1
得分: 1
尝试将差异的累加和也添加进去:
switch_blocsk = df['SWITCH'].ne(df['SWITCH'].shift()).cumsum()
df['cum.count'] = df.groupby(['ID', switch_blocks]).cumcount().add(1)
英文:
Try put in the cumsum of the difference as well:
switch_blocsk = df['SWITCH'].ne(df['SWITCH'].shift()).cumsum()
df['cum.count'] = df.groupby(['ID', switch_blocks]).cumcount().add(1)
答案2
得分: 1
你需要创建一个新列,表示'SWITCH'列的变化,然后可以使用'groupby'来执行累积计数。
结果:
| ID | SWITCH | SWITCH_CHANGE | Cum. Count |
|---|---|---|---|
| 0 | A | ON | 1 |
| 1 | A | ON | 0 |
| 2 | A | ON | 0 |
| 3 | A | OFF | 1 |
| 4 | A | OFF | 0 |
| 5 | A | OFF | 0 |
| 6 | A | ON | 1 |
| 7 | A | ON | 0 |
| 8 | A | ON | 0 |
| 9 | B | ON | 0 |
| 10 | B | ON | 0 |
| 11 | B | ON | 0 |
| 12 | B | OFF | 1 |
| 13 | B | OFF | 0 |
| 14 | B | OFF | 0 |
| 15 | B | OFF | 0 |
| 16 | B | OFF | 0 |
| 17 | B | OFF | 0 |
英文:
You need to create a new column which indicates the change in the 'SWITCH' column, then you can use 'groupby' to perform the cumulative count.
import pandas as pd
# Create sample data
df = pd.DataFrame({'ID': ['A'] * 9 + ['B'] * 9,
'SWITCH': ['ON'] * 3 + ['OFF'] * 3 + ['ON'] * 3 + ['ON'] * 3 + ['OFF'] * 3 + ['OFF'] * 3})
df['SWITCH_CHANGE'] = (df['SWITCH'] != df['SWITCH'].shift()).astype(int)
df['Cum. Count'] = df.groupby(['ID', df.SWITCH_CHANGE.cumsum()])['SWITCH'].cumcount() + 1
print(df)
Result:
| ID SWITCH | SWITCH_CHANGE | Cum. Count | ||
|---|---|---|---|---|
| 0 | A | ON | 1 | 1 |
| 1 | A | ON | 0 | 2 |
| 2 | A | ON | 0 | 3 |
| 3 | A | OFF | 1 | 1 |
| 4 | A | OFF | 0 | 2 |
| 5 | A | OFF | 0 | 3 |
| 6 | A | ON | 1 | 1 |
| 7 | A | ON | 0 | 2 |
| 8 | A | ON | 0 | 3 |
| 9 | B | ON | 0 | 1 |
| 10 | B | ON | 0 | 2 |
| 11 | B | ON | 0 | 3 |
| 12 | B | OFF | 1 | 1 |
| 13 | B | OFF | 0 | 2 |
| 14 | B | OFF | 0 | 3 |
| 15 | B | OFF | 0 | 4 |
| 16 | B | OFF | 0 | 5 |
| 17 | B | OFF | 0 | 6 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论