英文:
pd.read_csv column misalignment
问题
我想将数据中的 'NaN' 更改为其原始数据。我应该在代码中做哪些更改?
要将 'NaN' 更改回其原始数据,你可以在读取文本文件时指定一个用于表示缺失值的标记。在你的代码中,你可以使用 na_values
参数来指定 'NaN' 应该被视为缺失值,然后 Pandas 将会正确地读取原始数据而不是 'NaN'。以下是你的代码中所需的更改:
import pandas as pd
filename = r'data.txt'
# 读取文本文件到 Pandas DataFrame,将 'NaN' 视为缺失值
df = pd.read_csv(
filename,
sep="\t",
skiprows=3,
names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
na_values='NaN', # 指定 'NaN' 作为缺失值
)
df
通过这样的更改,Pandas 将会正确地读取原始数据而不是 'NaN',从而解决了列与数据不对齐的问题。
英文:
I have a text file containing two columns, but when I want to read the data using Pandas, the columns and the data are misaligned.
Here's my code for reference:
import pandas as pd
filename = r'data.txt'
#read text file into pandas DataFrame
df = pd.read_csv(
filename,
sep="\t",
skiprows=3,
names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
)
df
Result:
Frequency / MHz S2,1 [SPara1]/abs,dB
0 0.1423125 -... NaN
1 0.146625 -... NaN
2 0.1509375 -... NaN
3 0.15525 -... NaN
4 0.1595625 -... NaN
... ... ...
4059 17.64675 ... NaN
4060 17.651063 ... NaN
4061 17.655375 ... NaN
4062 17.659688 ... NaN
4063 17.664 ... NaN
My sample data.txt:
Frequency / MHz S2,1 [SPara1]/abs,dB
----------------------------------------------------------------------
0.138 -0.92293553
0.1423125 -0.93264485
0.146625 -0.94201416
0.1509375 -0.95106676
0.15525 -0.95982484
0.1595625 -0.96830956
0.163875 -0.97654091
0.1681875 -0.98453781
0.1725 -0.99231804
0.1768125 -0.99989837
0.181125 -1.0072945
0.1854375 -1.0145212
0.18975 -1.0215921
0.1940625 -1.0285203
0.198375 -1.0353177
0.2026875 -1.0419956
0.207 -1.0485644
0.2113125 -1.0550339
I want the data 'NaN' to change to its original data. What should I change in my code?
答案1
得分: 1
你应该将你正在使用的分隔符更改为"\s\s+"
:
df = pd.read_csv(
filename,
sep="\s\s+",
skiprows=3,
names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
engine="python"
)
这将输出:
Frequency / MHz S2,1 [SPara1]/abs,dB
0 0.142313 -0.932645
1 0.146625 -0.942014
2 0.150938 -0.951067
3 0.155250 -0.959825
4 0.159562 -0.968310
5 0.163875 -0.976541
6 0.168187 -0.984538
7 0.172500 -0.992318
8 0.176813 -0.999898
9 0.181125 -1.007294
10 0.185438 -1.014521
11 0.189750 -1.021592
12 0.194062 -1.028520
13 0.198375 -1.035318
14 0.202687 -1.041996
15 0.207000 -1.048564
16 0.211312 -1.055034
英文:
You should change the separator that you're using to "\s\s+"
:
df = pd.read_csv(
filename,
sep="\s+",
skiprows=3,
names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
engine="python"
)
This outputs:
Frequency / MHz S2,1 [SPara1]/abs,dB
0 0.142313 -0.932645
1 0.146625 -0.942014
2 0.150938 -0.951067
3 0.155250 -0.959825
4 0.159562 -0.968310
5 0.163875 -0.976541
6 0.168187 -0.984538
7 0.172500 -0.992318
8 0.176813 -0.999898
9 0.181125 -1.007294
10 0.185438 -1.014521
11 0.189750 -1.021592
12 0.194062 -1.028520
13 0.198375 -1.035318
14 0.202687 -1.041996
15 0.207000 -1.048564
16 0.211312 -1.055034
答案2
得分: 0
我认为最好使用delim_whitespace=True
df = pd.read_csv(
filename,
delim_whitespace=True,
skiprows=3,
names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
)
这将输出:
Frequency / MHz S2,1 [SPara1]/abs,dB
0 0.142313 -0.932645
1 0.146625 -0.942014
2 0.150938 -0.951067
3 0.155250 -0.959825
4 0.159562 -0.968310
5 0.163875 -0.976541
6 0.168187 -0.984538
7 0.172500 -0.992318
8 0.176813 -0.999898
9 0.181125 -1.007294
10 0.185438 -1.014521
11 0.189750 -1.021592
12 0.194062 -1.028520
13 0.198375 -1.035318
14 0.202687 -1.041996
15 0.207000 -1.048564
16 0.211312 -1.055034
英文:
I think it is better to use delim_whitespace=True
df = pd.read_csv(
filename,
delim_whitespace=True,
skiprows=3,
names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
)
This outputs:
Frequency / MHz S2,1 [SPara1]/abs,dB
0 0.142313 -0.932645
1 0.146625 -0.942014
2 0.150938 -0.951067
3 0.155250 -0.959825
4 0.159562 -0.968310
5 0.163875 -0.976541
6 0.168187 -0.984538
7 0.172500 -0.992318
8 0.176813 -0.999898
9 0.181125 -1.007294
10 0.185438 -1.014521
11 0.189750 -1.021592
12 0.194062 -1.028520
13 0.198375 -1.035318
14 0.202687 -1.041996
15 0.207000 -1.048564
16 0.211312 -1.055034
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论