pd.read_csv列错位

huangapple go评论85阅读模式
英文:

pd.read_csv column misalignment

问题

我想将数据中的 'NaN' 更改为其原始数据。我应该在代码中做哪些更改?

要将 'NaN' 更改回其原始数据,你可以在读取文本文件时指定一个用于表示缺失值的标记。在你的代码中,你可以使用 na_values 参数来指定 'NaN' 应该被视为缺失值,然后 Pandas 将会正确地读取原始数据而不是 'NaN'。以下是你的代码中所需的更改:

import pandas as pd
filename = r'data.txt'
# 读取文本文件到 Pandas DataFrame,将 'NaN' 视为缺失值
df = pd.read_csv(
    filename,
    sep="\t",
    skiprows=3,
    names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
    na_values='NaN',  # 指定 'NaN' 作为缺失值
)

df

通过这样的更改,Pandas 将会正确地读取原始数据而不是 'NaN',从而解决了列与数据不对齐的问题。

英文:

I have a text file containing two columns, but when I want to read the data using Pandas, the columns and the data are misaligned.

Here's my code for reference:

import pandas as pd
filename = r'data.txt'
#read text file into pandas DataFrame
df = pd.read_csv(
    filename,
    sep="\t",
    skiprows=3,
    names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
)

df

Result:

Frequency / MHz	S2,1 [SPara1]/abs,dB
0	0.1423125 -...	NaN
1	0.146625 -...	NaN
2	0.1509375 -...	NaN
3	0.15525 -...	NaN
4	0.1595625 -...	NaN
...	...	...
4059	17.64675 ...	NaN
4060	17.651063 ...	NaN
4061	17.655375 ...	NaN
4062	17.659688 ...	NaN
4063	17.664 ...	NaN

My sample data.txt:

  Frequency / MHz                S2,1 [SPara1]/abs,dB
----------------------------------------------------------------------
                   0.138                     -0.92293553
               0.1423125                     -0.93264485
                0.146625                     -0.94201416
               0.1509375                     -0.95106676
                 0.15525                     -0.95982484
               0.1595625                     -0.96830956
                0.163875                     -0.97654091
               0.1681875                     -0.98453781
                  0.1725                     -0.99231804
               0.1768125                     -0.99989837
                0.181125                      -1.0072945
               0.1854375                      -1.0145212
                 0.18975                      -1.0215921
               0.1940625                      -1.0285203
                0.198375                      -1.0353177
               0.2026875                      -1.0419956
                   0.207                      -1.0485644
               0.2113125                      -1.0550339

I want the data 'NaN' to change to its original data. What should I change in my code?

答案1

得分: 1

你应该将你正在使用的分隔符更改为"\s\s+"

df = pd.read_csv(
    filename,
    sep="\s\s+",
    skiprows=3,
    names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
    engine="python"
)

这将输出:

    Frequency / MHz  S2,1 [SPara1]/abs,dB
0          0.142313             -0.932645
1          0.146625             -0.942014
2          0.150938             -0.951067
3          0.155250             -0.959825
4          0.159562             -0.968310
5          0.163875             -0.976541
6          0.168187             -0.984538
7          0.172500             -0.992318
8          0.176813             -0.999898
9          0.181125             -1.007294
10         0.185438             -1.014521
11         0.189750             -1.021592
12         0.194062             -1.028520
13         0.198375             -1.035318
14         0.202687             -1.041996
15         0.207000             -1.048564
16         0.211312             -1.055034
英文:

You should change the separator that you're using to "\s\s+":

df = pd.read_csv(
    filename,
    sep="\s+",
    skiprows=3,
    names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
    engine="python"
)

This outputs:

    Frequency / MHz  S2,1 [SPara1]/abs,dB
0          0.142313             -0.932645
1          0.146625             -0.942014
2          0.150938             -0.951067
3          0.155250             -0.959825
4          0.159562             -0.968310
5          0.163875             -0.976541
6          0.168187             -0.984538
7          0.172500             -0.992318
8          0.176813             -0.999898
9          0.181125             -1.007294
10         0.185438             -1.014521
11         0.189750             -1.021592
12         0.194062             -1.028520
13         0.198375             -1.035318
14         0.202687             -1.041996
15         0.207000             -1.048564
16         0.211312             -1.055034

答案2

得分: 0

我认为最好使用delim_whitespace=True

df = pd.read_csv(
    filename,
    delim_whitespace=True,
    skiprows=3,
    names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
)

这将输出:

    Frequency / MHz  S2,1 [SPara1]/abs,dB
0          0.142313             -0.932645
1          0.146625             -0.942014
2          0.150938             -0.951067
3          0.155250             -0.959825
4          0.159562             -0.968310
5          0.163875             -0.976541
6          0.168187             -0.984538
7          0.172500             -0.992318
8          0.176813             -0.999898
9          0.181125             -1.007294
10         0.185438             -1.014521
11         0.189750             -1.021592
12         0.194062             -1.028520
13         0.198375             -1.035318
14         0.202687             -1.041996
15         0.207000             -1.048564
16         0.211312             -1.055034
英文:

I think it is better to use delim_whitespace=True

df = pd.read_csv(
    filename,
    delim_whitespace=True,
    skiprows=3,
    names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
)

This outputs:

    Frequency / MHz  S2,1 [SPara1]/abs,dB
0          0.142313             -0.932645
1          0.146625             -0.942014
2          0.150938             -0.951067
3          0.155250             -0.959825
4          0.159562             -0.968310
5          0.163875             -0.976541
6          0.168187             -0.984538
7          0.172500             -0.992318
8          0.176813             -0.999898
9          0.181125             -1.007294
10         0.185438             -1.014521
11         0.189750             -1.021592
12         0.194062             -1.028520
13         0.198375             -1.035318
14         0.202687             -1.041996
15         0.207000             -1.048564
16         0.211312             -1.055034

huangapple
  • 本文由 发表于 2023年1月9日 09:33:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052474.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定