pandas and numby to read csv and convert it from 2d vector to 1d with ignoring diagonal values

huangapple go评论77阅读模式
英文:

pandas and numby to read csv and convert it from 2d vector to 1d with ignoring diagonal values

问题

我的CSV文件看起来是这样的:

    0  |0.1|0.2|0.4|
    0.1|0  |0.5|0.6|
    0.2|0.5|0  |0.9|
    0.4|0.6|0.9|0  |

我尝试逐行读取它,忽略对角线上的值,并将其写成一个长列,像这样:

    0.1
    0.2
    0.4
    0.1
    0.5
    0.6
    0.2
    0.5
    0.9
    .... 

我使用了这个方法:

    import numpy as np
    import pandas as pd
    
    
    data = pd.read_csv(r"C:\Users\soso-\Desktop\SVM\DataSet\chem_Jacarrd_sim.csv")
    row_vector = np.array(data)
    result = row_vector.ravel()
    result.reshape(299756,1)
    df = pd.DataFrame({'chem':result})
    df.to_csv("my2.csv")

然而,输出忽略了第一行,并且读取了零的值,结果如下:
我该如何修复它?

    0.1
    0
    0.5
    0.6
    0.2
    0.5
    0
    0.9
    ....
英文:

My csv file looks like this:

0  |0.1|0.2|0.4|
0.1|0  |0.5|0.6|
0.2|0.5|0  |0.9|
0.4|0.6|0.9|0  |

I try to read it row by row, ignoring the diagonal values and write it as one long column like this:

0.1
0.2
0.4
0.1
0.5
0.6
0.2
0.5
0.9
.... 

I use this method:

import numpy as np
import pandas as pd


data = pd.read_csv(r"C:\Users\soso-\Desktop\SVM\DataSet\chem_Jacarrd_sim.csv")
row_vector = np.array(data)
result = row_vector.ravel()
result.reshape(299756,1)
df = pd.DataFrame({'chem':result})
df.to_csv("my2.csv")

However the output ignores the first row and reads the zero's like follows:
how can I fix it?

0.1
0
0.5
0.6
0.2
0.5
0
0.9
....

答案1

得分: 0

你现在有的数据框为:

0 |0.1|0.2|0.4
0.1|0 |0.5|0.6
0.2|0.5|0 |0.9
0.4|0.6|0.9|0

我将其保存为```ffff.csv```文件,你需要执行以下操作:

import numpy as np
import pandas as pd

data = pd.read_csv("ffff.csv", sep="|", header=None)
print(data)
row_vector = np.array(data)

创建一个具有正确形状的新掩码

mask = np.zeros((row_vector.shape), dtype=bool)
mask[np.arange(row_vector.shape[0]), np.arange(row_vector.shape[0])] = True

result = np.ma.array(row_vector, mask=mask)
result = result.compressed()

df = pd.DataFrame({'chem':result})
df.to_csv("my2.csv", index=False)
print(df)


执行结果为:

chem

0 0.1
1 0.2
2 0.4
3 0.1
4 0.5
5 0.6
6 0.2
7 0.5
8 0.9
9 0.4
10 0.6
11 0.9


<details>
<summary>英文:</summary>

For the datframe you have:

0 |0.1|0.2|0.4
0.1|0 |0.5|0.6
0.2|0.5|0 |0.9
0.4|0.6|0.9|0

which I saved as the ```ffff.csv```df, you need to do the following thing:

import numpy as np
import pandas as pd

data = pd.read_csv("ffff.csv", sep="|", header=None)
print(data)
row_vector = np.array(data)

Create a new mask with the correct shape

mask = np.zeros((row_vector.shape), dtype=bool)
mask[np.arange(row_vector.shape[0]), np.arange(row_vector.shape[0])] = True

result = np.ma.array(row_vector, mask=mask)
result = result.compressed()

df = pd.DataFrame({'chem':result})
df.to_csv("my2.csv", index=False)
print(df)


which returns:

chem

0 0.1
1 0.2
2 0.4
3 0.1
4 0.5
5 0.6
6 0.2
7 0.5
8 0.9
9 0.4
10 0.6
11 0.9


</details>



# 答案2
**得分**: 0

这个代码比较简短,假设你有一个二维的NumPy数组:

```python
import numpy as np
arr = np.random.rand(3,3)

# array([[0.12964821, 0.92124532, 0.72456772],
#        [0.26063188, 0.1486612 , 0.45312145],
#        [0.04165099, 0.31071689, 0.26935581]])

arr_out = arr[np.where(~np.eye(arr.shape[0],dtype=bool))]

# array([0.92124532, 0.72456772, 0.26063188, 0.45312145, 0.04165099,
#        0.31071689])
英文:

This one is a bit shorter

  • assuming you have 2d numpy array
import numpy as np
arr = np.random.rand(3,3)

# array([[0.12964821, 0.92124532, 0.72456772],
#        [0.26063188, 0.1486612 , 0.45312145],
#        [0.04165099, 0.31071689, 0.26935581]])

arr_out = arr[np.where(~np.eye(arr.shape[0],dtype=bool))]

# array([0.92124532, 0.72456772, 0.26063188, 0.45312145, 0.04165099,
#        0.31071689])

huangapple
  • 本文由 发表于 2023年2月8日 20:33:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75385853.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定