2023年6月13日 03:59:32go评论91阅读模式

英文:

ID column based on condition in another column

问题

你好，以下是你要求的翻译部分：

嗨，
我有以下的A列，显示了连续的数据采样。如果数值发生变化，传感器正在测量数据，如果数值重复出现，传感器未测量实际数值。

我想要遍历A列，并根据以下条件分配一个ID列：如果测量开始且数值连续变化，ID应该相同。如果数值不变，ID应递增一次。然后，如果数值再次变化，ID列应再次递增，依此类推。

迄今为止尝试了以下代码，但未准确分配ID：

import pandas as pd
data = [3.5, 3.6, 3.7, 3.8, 1, 1, 1, 1, 1, 3.9, 4.0, 4.2, 4.4, 4.6, 4.8, 3,
3, 3, 3, 3.2, 3.3, 3.5, 2.1, 2.1, 2.1, 2.1]
df = pd.DataFrame({'A': data})
# 初始化ID
df['ID'] = 0
# 分配ID
id_value = 1
for i in range(1, len(df)):
   if df.at[i, 'A'] != df.at[i-1, 'A']:
      df.at[i, 'ID'] = df.at[i-1, 'ID']
   else:
      df.at[i, 'ID'] = id_value
      id_value += 1
print(df)
期望输出：
      A  ID
0   3.5   0
1   3.6   0
2   3.7   0
3   3.8   0
4   1.0   1
5   1.0   1
6   1.0   1
7   1.0   1
8   1.0   1
9   3.9   2
10  4.0   2
…

在我的输出中，数据连续递增，因此每行都有自己的ID。

希望这对你有帮助。如果有其他问题，请随时提问。

英文:

Hej,
I have the following column A which shows continuous data sampling. If the value changes the Sensor is measuring data, if the values repeat itself the sensor is not measuring actual values.
`
I would like to loop through A and assign an ID column with the condition that if the measurement starts and the values are changing continuously the ID should be the same. If the values are not changing the ID should increment by one. If then the values start changing again another increment should be added to the ID column and so forth.
Tried this so far which does not accurately assigns the ID

import pandas as pd
data = [3.5, 3.6, 3.7, 3.8, 1, 1, 1, 1, 1, 3.9, 4.0, 4.2, 4.4, 4.6, 4.8, 3,        
3, 3,  3, 3.2, 3.3, 3.5, 2.1, 2.1, 2.1, 2.1]
df = pd.DataFrame({&#39;A&#39;: data})
# Initialize ID 
df[&#39;ID&#39;] = 0
# Assign ID
 id_value = 1
 for i in range(1, len(df)):
    if df.at[i, &#39;A&#39;] != df.at[i-1, &#39;A&#39;]:
       df.at[i, &#39;ID&#39;] = df.at[i-1, &#39;ID&#39;]
    else:
       df.at[i, &#39;ID&#39;] = id_value
       id_value += 1
print(df)
Expected output:
      A  ID
 0   3.5   0
 1   3.6   0
 2   3.7   0
 3   3.8   0
 4   1.0   1
 5   1.0   1
 6   1.0   1
 7   1.0   1
 8   1.0   1
 9   3.9   2
 10  4.0   2

…

In my output the data is incremented continuously, so each row has its own ID.

答案1

得分: 2

如果期望的输出是正确的，您可以通过查看前一行和下一行的值来执行此操作，使用shift方法。将结果分配给一个新列，以便我们可以使用累积和来处理它。

>>> df = pd.DataFrame({'A': data})
>>> df['Changing'] = (df['A'] != df['A'].shift(-1)) & (df['A'] != df['A'].shift())
>>> df
      A  Changing
0   3.5      True
1   3.6      True
2   3.7      True
3   3.8      True
4   1.0     False
5   1.0     False
6   1.0     False
7   1.0     False
8   1.0     False
9   3.9      True
10  4.0      True
11  4.2      True
12  4.4      True
13  4.6      True
14  4.8      True
15  3.0     False
16  3.0     False
17  3.0     False
18  3.0     False
19  3.2      True
20  3.3      True
21  3.5      True
22  2.1     False
23  2.1     False
24  2.1     False
25  2.1     False
>>> df['ID'] = (df['Changing'] != df['Changing'].shift()).cumsum() - 1
>>> df
      A  Changing  ID
0   3.5      True   0
1   3.6      True   0
2   3.7      True   0
3   3.8      True   0
4   1.0     False   1
5   1.0     False   1
6   1.0     False   1
7   1.0     False   1
8   1.0     False   1
9   3.9      True   2
10  4.0      True   2
11  4.2      True   2
12  4.4      True   2
13  4.6      True   2
14  4.8      True   2
15  3.0     False   3
16  3.0     False   3
17  3.0     False   3
18  3.0     False   3
19  3.2      True   4
20  3.3      True   4
21  3.5      True   4
22  2.1     False   5
23  2.1     False   5
24  2.1     False   5
25  2.1     False   5

我从cumsum中减去1以符合期望的输出。然后，根据需要丢弃添加的列。

英文:

If the expected output is correct you can do this by looking at the value of the previous AND the next row using shift. Assigning the result to a new column so we can make use the cumulative sum against it.

&gt;&gt;&gt; df = pd.DataFrame({&#39;A&#39;: data})
&gt;&gt;&gt; df[&#39;Changing&#39;] = (df[&#39;A&#39;] != df[&#39;A&#39;].shift(-1)) &amp; (df[&#39;A&#39;] != df[&#39;A&#39;].shift())
&gt;&gt;&gt; df
A  Changing
0   3.5      True
1   3.6      True
2   3.7      True
3   3.8      True
4   1.0     False
5   1.0     False
6   1.0     False
7   1.0     False
8   1.0     False
9   3.9      True
10  4.0      True
11  4.2      True
12  4.4      True
13  4.6      True
14  4.8      True
15  3.0     False
16  3.0     False
17  3.0     False
18  3.0     False
19  3.2      True
20  3.3      True
21  3.5      True
22  2.1     False
23  2.1     False
24  2.1     False
25  2.1     False
&gt;&gt;&gt; df[&#39;ID&#39;] = (df[&#39;Changing&#39;] != df[&#39;Changing&#39;].shift()).cumsum() - 1
&gt;&gt;&gt; df
A  Changing  ID
0   3.5      True   0
1   3.6      True   0
2   3.7      True   0
3   3.8      True   0
4   1.0     False   1
5   1.0     False   1
6   1.0     False   1
7   1.0     False   1
8   1.0     False   1
9   3.9      True   2
10  4.0      True   2
11  4.2      True   2
12  4.4      True   2
13  4.6      True   2
14  4.8      True   2
15  3.0     False   3
16  3.0     False   3
17  3.0     False   3
18  3.0     False   3
19  3.2      True   4
20  3.3      True   4
21  3.5      True   4
22  2.1     False   5
23  2.1     False   5
24  2.1     False   5
25  2.1     False   5

I subtracted 1 from the cumsum to follow the expected output. You can then drop the added column as needed.

答案2

得分: 0

After several fails, I managed to get the output you wanted:

Explanation:

我编辑了脚本，以准确跟踪值是否连续变化或保持不变，并相应地分配ID。

添加了 prev_value 变量，以在循环中跟踪前一个值。
添加了 value_was_changing 标志，以跟踪前一个值是否发生了变化。
添加了 value_is_changing 以确定当前值是否与前一个值不同。
当在改变和不改变状态之间发生转换时，循环增加 id_value。
在每次迭代结束时更新 prev_value 和 value_was_changing 以在下一次迭代中使用。

英文:

After several fails, I managed to get the output you wanted

Code

import pandas as pd
data = [3.5, 3.6, 3.7, 3.8, 1, 1, 1, 1, 1, 3.9, 4.0, 4.2, 4.4, 4.6, 4.8, 3, 3, 3, 3, 3.2, 3.3, 3.5, 2.1, 2.1, 2.1, 2.1]
df = pd.DataFrame({&#39;A&#39;: data})
# Initialize ID
df[&#39;ID&#39;] = 0
# Assign ID
id_value = 0
prev_value = df.at[0, &#39;A&#39;]
value_was_changing = True
for i in range(1, len(df)):
value_is_changing = df.at[i, &#39;A&#39;] != prev_value
if value_is_changing:
if not value_was_changing:
id_value += 1
else:
if value_was_changing:
id_value += 1
df.at[i, &#39;ID&#39;] = id_value
prev_value = df.at[i, &#39;A&#39;]
value_was_changing = value_is_changing
print(df)

Output

      A  ID
0   3.5   0
1   3.6   0
2   3.7   0
3   3.8   0
4   1.0   0
5   1.0   1
6   1.0   1
7   1.0   1
8   1.0   1
9   3.9   2
10  4.0   2
11  4.2   2
12  4.4   2
13  4.6   2
14  4.8   2
15  3.0   2
16  3.0   3
17  3.0   3
18  3.0   3
19  3.2   4
20  3.3   4
21  3.5   4
22  2.1   4
23  2.1   5
24  2.1   5
25  2.1   5

Explanation

I edited the script to accurately track whether the values are continuously changing or staying the same, and assign IDs accordingly.

Added prev_value variable to keep track of the previous value in the loop
Added value_was_changing flag to keep track of whether the previous value was changing or not.
Added value_is_changing to determine if the current value is different from the previous value.
The loop increments id_value when there's a transition between changing and not changing states.
Updated prev_value and value_was_changing at the end of each iteration for use in the next iteration.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据另一列的条件创建ID列

问题

答案1

答案2

langchain + Weaviate如何一次访问多列

无法从AWS秘密管理器中列出所有秘密。

sklearn异常值移除的转换器 – 返回xy？

Running a script after creation of instance in GCP Managed Instance Group

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。