英文:
Remove float duplicates from a list of tuples created by zip
问题
我使用zip函数将三个列表XaData、Y1aData和Y2aData合并成一个元组的列表XYZip:
XYZip = list(zip(XaData, Y1aData, Y2aData))
然后,您想要移除XaData值重复的元组,以确保x值是严格递增的。您可以使用以下方法实现:
# 创建一个字典来存储XaData值作为键,将唯一的元组作为值
unique_data = {}
for x, y1, y2 in XYZip:
if x not in unique_data:
unique_data[x] = (x, y1, y2)
# 获取唯一元组的列表
XYUnique = list(unique_data.values())
# 按照XaData值进行排序
XYSorted = sorted(XYUnique, key=lambda item: item[0])
# 分离XaData、Y1aData和Y2aData
XaData, Y1aData, Y2aData = zip(*XYSorted)
这将生成包含唯一XaData值的元组列表并按照XaData值进行排序,以满足您的要求。
英文:
I create a list of tuples by zipping three lists together, data pairs:
XYZip = list(zip(XaData, Y1aData, Y2aData))
[
(0.001625625, 4.782947316198166, -0.011032947316198166),
(-2.5e-06, 4.783447358402665, 0.020216552641597337),
(0.0008137499999999999, 4.782997384780477, -0.017282997384780476),
(0.00081, 4.783247405882726, 0.020216752594117274),
(0.001625625, 4.782066023993667, -0.011032066023993668),
(0.00324625, 4.780809700135795, 0.03271919029986421),
...,
(19.4121325, 4.653511649011105, 1.1703464883509889)
]
I need to get rid of the tuples where the XaData value is a duplicate, like this one 0.001625625. The whole tuple (0.001625625, 4.782066023993667, -0.011032066023993668) needs to go. Order doesn't matter. I can sort in a second step. I tried set() to no avail.
Data will be fed to a scipy.CubicSpline function where x need to be strictly increasing and will not accept duplicates!
I tried...
XYZip = list(zip(XaData, Y1aData, Y2aData))
XYUnique = set(XYZip)
XYSorted = sorted(XYUnique)
XaData, Y1aData, Y2aData = zip(*XYSorted)
...obviously not removing the tuples with the duplicate XaData value.
This is what I need in the first step:
[
(0.001625625, 4.782947316198166, -0.011032947316198166),
(-2.5e-06, 4.783447358402665, 0.020216552641597337),
(0.0008137499999999999, 4.782997384780477, -0.017282997384780476),
(0.00081, 4.783247405882726, 0.020216752594117274),
(0.00324625, 4.780809700135795, 0.03271919029986421),
...,
(19.4121325, 4.653511649011105, 1.1703464883509889)
]
答案1
得分: 1
为什么元组的集合在这种情况下没有返回正确的答案
(0.001625625, 4.782947316198166, -0.011032947316198166)
, | , |
相同, 不相同 , 不相同
, | , |
(0.001625625, 4.782066023993667, -0.011032066023993668)
# 所以
(0.001625625, 4.782947316198166, -0.011032947316198166)
# 不相等
(0.001625625, 4.782066023993667, -0.011032066023993668)
如果您使用 pandas,就会变得简单而直观。
示例:
x = 1,1,2,3,4,5
y = 1,2,3,4,5,6
z = 1,3,5,7,9,11
pd.DataFrame(zip(x, y, z), columns=["x","y","z"]).drop_duplicates(subset='x', keep="last")
步骤 1
创建数据框
df = pd.DataFrame(zip(x, y, z), columns=["x","y","z"])
x | y | z | |
---|---|---|---|
0 | 1 | 1 | 1 |
1 | 1 | 2 | 3 |
2 | 2 | 3 | 5 |
3 | 3 | 4 | 7 |
4 | 4 | 5 | 9 |
5 | 5 | 6 | 11 |
步骤 2
删除重复项 参考
df = df.drop_duplicates(subset='x', keep="last")
x | y | z | |
---|---|---|---|
1 | 1 | 2 | 3 |
2 | 2 | 3 | 5 |
3 | 3 | 4 | 7 |
4 | 4 | 5 | 9 |
5 | 5 | 6 | 11 |
删除 (1,1,1) 因为 keep="last"
将它整合到您的代码中
XYZip = list(zip(XaData, Y1aData, Y2aData))
# 创建数据框
df = pd.DataFrame(XYZip, columns=["XaData","Y1aData","Y2aData"])
# 删除 XaData 值上的重复项。
df = df.drop_duplicates(subset='XaData', keep="last")
# 如果您想要转换为元组的列表
result = [tuple(i) for i in df.values]
# result = [(1, 2, 3), (2, 3, 5), (3, 4, 7), (4, 5, 9), (5, 6, 11)]
或者
使用元组的字典。
temp = {i_x: (i_y, i_z) for i_x, i_y, i_z in zip(x, y, z)}
[((i,)+temp[i]) for i in temp]
步骤 1
将 x、y、z 转换为字典(键为 x,因为我需要删除 x 上的重复项)
temp = {i_x: (i_y, i_z) for i_x, i_y, i_z in zip(x, y, z)}
步骤 2
转换为元组的列表
[((i,)+temp[i]) for i in temp]
结果
[(1, 2, 3), (2, 3, 5), (3, 4, 7), (4, 5, 9), (5, 6, 11)]
# 删除 (1,1,1) 因为 (1, 1, 1) 和 (1, 2, 3) 在第一个元素上相同。
英文:
Why set of tuple is not return correct answer in this case
(0.001625625, 4.782947316198166, -0.011032947316198166)
, | , |
same, not same , not same
, | , |
(0.001625625, 4.782066023993667, -0.011032066023993668)
# so
(0.001625625, 4.782947316198166, -0.011032947316198166)
# not equal
(0.001625625, 4.782066023993667, -0.011032066023993668)
It is easy and straightforward if you use pandas instead.
Example:
x = 1,1,2,3,4,5
y = 1,2,3,4,5,6
z = 1,3,5,7,9,11
pd.DataFrame(zip(x, y, z), columns=["x","y","z"]).drop_duplicates(subset='x', keep="last")
step 1
create DataFrame
df = pd.DataFrame(zip(x, y, z), columns=["x","y","z"])
x | y | z | |
---|---|---|---|
0 | 1 | 1 | 1 |
1 | 1 | 2 | 3 |
2 | 2 | 3 | 5 |
3 | 3 | 4 | 7 |
4 | 4 | 5 | 9 |
5 | 5 | 6 | 11 |
step 2
drop duplicates Reference
df = df.drop_duplicates(subset='x', keep="last")
x | y | z | |
---|---|---|---|
1 | 1 | 2 | 3 |
2 | 2 | 3 | 5 |
3 | 3 | 4 | 7 |
4 | 4 | 5 | 9 |
5 | 5 | 6 | 11 |
drop 1, 1, 1 because keep="last"
Combine to your code
XYZip = list(zip(XaData, Y1aData, Y2aData))
# create data frame
df = pd.DataFrame(XYZip, columns=["XaData","Y1aData","Y2aData"])
# removing the duplicate on XaData value.
df = df.drop_duplicates(subset='XaData', keep="last")
# if you want to convert to list of tuple
result = [tuple(i) for i in df.values]
# result = [(1, 2, 3), (2, 3, 5), (3, 4, 7), (4, 5, 9), (5, 6, 11)]
or
Use dictionary of tuple instead.
temp = {i_x: (i_y, i_z) for i_x, i_y, i_z in zip(x, y, z)}
[((i,)+temp[i]) for i in temp]
step 1
convert x y z to dictionary (key x because I need to delete duplicate on x)
temp = {i_x: (i_y, i_z) for i_x, i_y, i_z in zip(x, y, z)}
step 2
convert to list of tuple
[((i,)+temp[i]) for i in temp]
result
[(1, 2, 3), (2, 3, 5), (3, 4, 7), (4, 5, 9), (5, 6, 11)]
# drop (1,1,1) because (1, 1, 1) and (1, 2, 3) are same in first element.
答案2
得分: 0
我通过检查值是否已在列表中解决了这个问题,例如:
if x not in x_list:
x_list.append(x)
y_list.append(y)
z_list.append(z)
这解决了问题。所有带有 set()
的构造都失败了,而且 scipy 的样条插值引发了关于 x 值不是严格递增的错误。
英文:
I solved it by checking if value is already in list like
if x not in x_list:
x_list.append(x)
y_list.append(y)
z_list.append(z)
That solved it. All constructs with set() failed and scipy spline interpolation threw an error about not strictly increasing x-values.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论