Reading a complex, large text file.

huangapple go评论73阅读模式
英文:

Reading a complex, large text file

问题

我有一个非常大的文本文件,我正在尝试加载到jupyternotebook中进行分析等等。

但是我似乎找不到分隔列的方法?到目前为止,我只有在处理相对容易掌握的hdf5和csv文件的经验。

我将在下面附上数据的链接:

https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-022-04496-5/MediaObjects/41586_2022_4496_MOESM3_ESM.txt

df1 = pd.read_csv('41586_2022_4496_MOESM3_ESM.txt', delimiter='\t')
print(df1.head(2))

结果

       1    331.581577     -1.512106  17.774   2.143  -0.828   0.132     104.93    1092.57      45.54     7.355     1.359    -1.468     267695571003410291                   20111024-F5902-01-061    26.9  5520.3    40.0    3.951    0.116    1.581    0.430    2.296    0.188    0.339    0.041
0       2    332.300352     -1.566708   6.780   0...
1       3    331.985497     -1.371940  18.426   1...

提前感谢 Reading a complex, large text file.

英文:

I have a very large text file which I am trying to load into jupyternotebook to perform analysis and etc..

But I can't seem to find a way to separate the columns? Thus far I have only had experience in working with hdf5 and csv files which are relatively easy to get a hang of.

I will attach a link to the data below,

https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-022-04496-5/MediaObjects/41586_2022_4496_MOESM3_ESM.txt

df1 = pd.read_csv('41586_2022_4496_MOESM3_ESM.txt', delimiter='\t')
print(df1.head(2))

result

       1    331.581577     -1.512106  17.774   2.143  -0.828   0.132     104.93    1092.57      45.54     7.355     1.359    -1.468     267695571003410291                   20111024-F5902-01-061    26.9  5520.3    40.0    3.951    0.116    1.581    0.430    2.296    0.188    0.339    0.041
0       2    332.300352     -1.566708   6.780   0...                                                                                                                                                                                                                                              
1       3    331.985497     -1.371940  18.426   1...                                                                                                                                                                                                                                              

Thanks in advance Reading a complex, large text file.

答案1

得分: 0

你的CSV中没有制表符。
更改分隔符。

英文:

There is no tab in your CSV.
Change the delimiter.

import pandas as pd

# https://stackoverflow.com/a/19633103/20307768
# '\s+': it says to expect one or more spaces. the matches will be as large as possible.
df1 = pd.read_csv('41586_2022_4496_MOESM3_ESM.txt', delimiter='\s+')
df1.head(2)

huangapple
  • 本文由 发表于 2023年7月3日 02:11:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76600211.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定