英文:
pandas and numby to read csv and convert it from 2d vector to 1d with ignoring diagonal values
问题
我的CSV文件看起来是这样的:
0 |0.1|0.2|0.4|
0.1|0 |0.5|0.6|
0.2|0.5|0 |0.9|
0.4|0.6|0.9|0 |
我尝试逐行读取它,忽略对角线上的值,并将其写成一个长列,像这样:
0.1
0.2
0.4
0.1
0.5
0.6
0.2
0.5
0.9
....
我使用了这个方法:
import numpy as np
import pandas as pd
data = pd.read_csv(r"C:\Users\soso-\Desktop\SVM\DataSet\chem_Jacarrd_sim.csv")
row_vector = np.array(data)
result = row_vector.ravel()
result.reshape(299756,1)
df = pd.DataFrame({'chem':result})
df.to_csv("my2.csv")
然而,输出忽略了第一行,并且读取了零的值,结果如下:
我该如何修复它?
0.1
0
0.5
0.6
0.2
0.5
0
0.9
....
英文:
My csv file looks like this:
0 |0.1|0.2|0.4|
0.1|0 |0.5|0.6|
0.2|0.5|0 |0.9|
0.4|0.6|0.9|0 |
I try to read it row by row, ignoring the diagonal values and write it as one long column like this:
0.1
0.2
0.4
0.1
0.5
0.6
0.2
0.5
0.9
....
I use this method:
import numpy as np
import pandas as pd
data = pd.read_csv(r"C:\Users\soso-\Desktop\SVM\DataSet\chem_Jacarrd_sim.csv")
row_vector = np.array(data)
result = row_vector.ravel()
result.reshape(299756,1)
df = pd.DataFrame({'chem':result})
df.to_csv("my2.csv")
However the output ignores the first row and reads the zero's like follows:
how can I fix it?
0.1
0
0.5
0.6
0.2
0.5
0
0.9
....
答案1
得分: 0
你现在有的数据框为:
0 |0.1|0.2|0.4
0.1|0 |0.5|0.6
0.2|0.5|0 |0.9
0.4|0.6|0.9|0
我将其保存为```ffff.csv```文件,你需要执行以下操作:
import numpy as np
import pandas as pd
data = pd.read_csv("ffff.csv", sep="|", header=None)
print(data)
row_vector = np.array(data)
创建一个具有正确形状的新掩码
mask = np.zeros((row_vector.shape), dtype=bool)
mask[np.arange(row_vector.shape[0]), np.arange(row_vector.shape[0])] = True
result = np.ma.array(row_vector, mask=mask)
result = result.compressed()
df = pd.DataFrame({'chem':result})
df.to_csv("my2.csv", index=False)
print(df)
执行结果为:
chem
0 0.1
1 0.2
2 0.4
3 0.1
4 0.5
5 0.6
6 0.2
7 0.5
8 0.9
9 0.4
10 0.6
11 0.9
<details>
<summary>英文:</summary>
For the datframe you have:
0 |0.1|0.2|0.4
0.1|0 |0.5|0.6
0.2|0.5|0 |0.9
0.4|0.6|0.9|0
which I saved as the ```ffff.csv```df, you need to do the following thing:
import numpy as np
import pandas as pd
data = pd.read_csv("ffff.csv", sep="|", header=None)
print(data)
row_vector = np.array(data)
Create a new mask with the correct shape
mask = np.zeros((row_vector.shape), dtype=bool)
mask[np.arange(row_vector.shape[0]), np.arange(row_vector.shape[0])] = True
result = np.ma.array(row_vector, mask=mask)
result = result.compressed()
df = pd.DataFrame({'chem':result})
df.to_csv("my2.csv", index=False)
print(df)
which returns:
chem
0 0.1
1 0.2
2 0.4
3 0.1
4 0.5
5 0.6
6 0.2
7 0.5
8 0.9
9 0.4
10 0.6
11 0.9
</details>
# 答案2
**得分**: 0
这个代码比较简短,假设你有一个二维的NumPy数组:
```python
import numpy as np
arr = np.random.rand(3,3)
# array([[0.12964821, 0.92124532, 0.72456772],
# [0.26063188, 0.1486612 , 0.45312145],
# [0.04165099, 0.31071689, 0.26935581]])
arr_out = arr[np.where(~np.eye(arr.shape[0],dtype=bool))]
# array([0.92124532, 0.72456772, 0.26063188, 0.45312145, 0.04165099,
# 0.31071689])
英文:
This one is a bit shorter
- assuming you have 2d numpy array
import numpy as np
arr = np.random.rand(3,3)
# array([[0.12964821, 0.92124532, 0.72456772],
# [0.26063188, 0.1486612 , 0.45312145],
# [0.04165099, 0.31071689, 0.26935581]])
arr_out = arr[np.where(~np.eye(arr.shape[0],dtype=bool))]
# array([0.92124532, 0.72456772, 0.26063188, 0.45312145, 0.04165099,
# 0.31071689])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论