pd.read_csv列错位

huangapple go评论117阅读模式
英文:

pd.read_csv column misalignment

问题

我想将数据中的 'NaN' 更改为其原始数据。我应该在代码中做哪些更改?

要将 'NaN' 更改回其原始数据,你可以在读取文本文件时指定一个用于表示缺失值的标记。在你的代码中,你可以使用 na_values 参数来指定 'NaN' 应该被视为缺失值,然后 Pandas 将会正确地读取原始数据而不是 'NaN'。以下是你的代码中所需的更改:

  1. import pandas as pd
  2. filename = r'data.txt'
  3. # 读取文本文件到 Pandas DataFrame,将 'NaN' 视为缺失值
  4. df = pd.read_csv(
  5. filename,
  6. sep="\t",
  7. skiprows=3,
  8. names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
  9. na_values='NaN', # 指定 'NaN' 作为缺失值
  10. )
  11. df

通过这样的更改,Pandas 将会正确地读取原始数据而不是 'NaN',从而解决了列与数据不对齐的问题。

英文:

I have a text file containing two columns, but when I want to read the data using Pandas, the columns and the data are misaligned.

Here's my code for reference:

  1. import pandas as pd
  2. filename = r'data.txt'
  3. #read text file into pandas DataFrame
  4. df = pd.read_csv(
  5. filename,
  6. sep="\t",
  7. skiprows=3,
  8. names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
  9. )
  10. df

Result:

  1. Frequency / MHz S2,1 [SPara1]/abs,dB
  2. 0 0.1423125 -... NaN
  3. 1 0.146625 -... NaN
  4. 2 0.1509375 -... NaN
  5. 3 0.15525 -... NaN
  6. 4 0.1595625 -... NaN
  7. ... ... ...
  8. 4059 17.64675 ... NaN
  9. 4060 17.651063 ... NaN
  10. 4061 17.655375 ... NaN
  11. 4062 17.659688 ... NaN
  12. 4063 17.664 ... NaN

My sample data.txt:

  1. Frequency / MHz S2,1 [SPara1]/abs,dB
  2. ----------------------------------------------------------------------
  3. 0.138 -0.92293553
  4. 0.1423125 -0.93264485
  5. 0.146625 -0.94201416
  6. 0.1509375 -0.95106676
  7. 0.15525 -0.95982484
  8. 0.1595625 -0.96830956
  9. 0.163875 -0.97654091
  10. 0.1681875 -0.98453781
  11. 0.1725 -0.99231804
  12. 0.1768125 -0.99989837
  13. 0.181125 -1.0072945
  14. 0.1854375 -1.0145212
  15. 0.18975 -1.0215921
  16. 0.1940625 -1.0285203
  17. 0.198375 -1.0353177
  18. 0.2026875 -1.0419956
  19. 0.207 -1.0485644
  20. 0.2113125 -1.0550339

I want the data 'NaN' to change to its original data. What should I change in my code?

答案1

得分: 1

你应该将你正在使用的分隔符更改为"\s\s+"

  1. df = pd.read_csv(
  2. filename,
  3. sep="\s\s+",
  4. skiprows=3,
  5. names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
  6. engine="python"
  7. )

这将输出:

  1. Frequency / MHz S2,1 [SPara1]/abs,dB
  2. 0 0.142313 -0.932645
  3. 1 0.146625 -0.942014
  4. 2 0.150938 -0.951067
  5. 3 0.155250 -0.959825
  6. 4 0.159562 -0.968310
  7. 5 0.163875 -0.976541
  8. 6 0.168187 -0.984538
  9. 7 0.172500 -0.992318
  10. 8 0.176813 -0.999898
  11. 9 0.181125 -1.007294
  12. 10 0.185438 -1.014521
  13. 11 0.189750 -1.021592
  14. 12 0.194062 -1.028520
  15. 13 0.198375 -1.035318
  16. 14 0.202687 -1.041996
  17. 15 0.207000 -1.048564
  18. 16 0.211312 -1.055034
英文:

You should change the separator that you're using to "\s\s+":

  1. df = pd.read_csv(
  2. filename,
  3. sep="\s+",
  4. skiprows=3,
  5. names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
  6. engine="python"
  7. )

This outputs:

  1. Frequency / MHz S2,1 [SPara1]/abs,dB
  2. 0 0.142313 -0.932645
  3. 1 0.146625 -0.942014
  4. 2 0.150938 -0.951067
  5. 3 0.155250 -0.959825
  6. 4 0.159562 -0.968310
  7. 5 0.163875 -0.976541
  8. 6 0.168187 -0.984538
  9. 7 0.172500 -0.992318
  10. 8 0.176813 -0.999898
  11. 9 0.181125 -1.007294
  12. 10 0.185438 -1.014521
  13. 11 0.189750 -1.021592
  14. 12 0.194062 -1.028520
  15. 13 0.198375 -1.035318
  16. 14 0.202687 -1.041996
  17. 15 0.207000 -1.048564
  18. 16 0.211312 -1.055034

答案2

得分: 0

我认为最好使用delim_whitespace=True

  1. df = pd.read_csv(
  2. filename,
  3. delim_whitespace=True,
  4. skiprows=3,
  5. names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
  6. )

这将输出:

  1. Frequency / MHz S2,1 [SPara1]/abs,dB
  2. 0 0.142313 -0.932645
  3. 1 0.146625 -0.942014
  4. 2 0.150938 -0.951067
  5. 3 0.155250 -0.959825
  6. 4 0.159562 -0.968310
  7. 5 0.163875 -0.976541
  8. 6 0.168187 -0.984538
  9. 7 0.172500 -0.992318
  10. 8 0.176813 -0.999898
  11. 9 0.181125 -1.007294
  12. 10 0.185438 -1.014521
  13. 11 0.189750 -1.021592
  14. 12 0.194062 -1.028520
  15. 13 0.198375 -1.035318
  16. 14 0.202687 -1.041996
  17. 15 0.207000 -1.048564
  18. 16 0.211312 -1.055034
英文:

I think it is better to use delim_whitespace=True

  1. df = pd.read_csv(
  2. filename,
  3. delim_whitespace=True,
  4. skiprows=3,
  5. names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
  6. )

This outputs:

  1. Frequency / MHz S2,1 [SPara1]/abs,dB
  2. 0 0.142313 -0.932645
  3. 1 0.146625 -0.942014
  4. 2 0.150938 -0.951067
  5. 3 0.155250 -0.959825
  6. 4 0.159562 -0.968310
  7. 5 0.163875 -0.976541
  8. 6 0.168187 -0.984538
  9. 7 0.172500 -0.992318
  10. 8 0.176813 -0.999898
  11. 9 0.181125 -1.007294
  12. 10 0.185438 -1.014521
  13. 11 0.189750 -1.021592
  14. 12 0.194062 -1.028520
  15. 13 0.198375 -1.035318
  16. 14 0.202687 -1.041996
  17. 15 0.207000 -1.048564
  18. 16 0.211312 -1.055034

huangapple
  • 本文由 发表于 2023年1月9日 09:33:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052474.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定