英文:
Reading a complex, large text file
问题
我有一个非常大的文本文件,我正在尝试加载到jupyternotebook中进行分析等等。
但是我似乎找不到分隔列的方法?到目前为止,我只有在处理相对容易掌握的hdf5和csv文件的经验。
我将在下面附上数据的链接:
df1 = pd.read_csv('41586_2022_4496_MOESM3_ESM.txt', delimiter='\t')
print(df1.head(2))
结果
1 331.581577 -1.512106 17.774 2.143 -0.828 0.132 104.93 1092.57 45.54 7.355 1.359 -1.468 267695571003410291 20111024-F5902-01-061 26.9 5520.3 40.0 3.951 0.116 1.581 0.430 2.296 0.188 0.339 0.041
0 2 332.300352 -1.566708 6.780 0...
1 3 331.985497 -1.371940 18.426 1...
提前感谢
英文:
I have a very large text file which I am trying to load into jupyternotebook to perform analysis and etc..
But I can't seem to find a way to separate the columns? Thus far I have only had experience in working with hdf5 and csv files which are relatively easy to get a hang of.
I will attach a link to the data below,
df1 = pd.read_csv('41586_2022_4496_MOESM3_ESM.txt', delimiter='\t')
print(df1.head(2))
result
1 331.581577 -1.512106 17.774 2.143 -0.828 0.132 104.93 1092.57 45.54 7.355 1.359 -1.468 267695571003410291 20111024-F5902-01-061 26.9 5520.3 40.0 3.951 0.116 1.581 0.430 2.296 0.188 0.339 0.041
0 2 332.300352 -1.566708 6.780 0...
1 3 331.985497 -1.371940 18.426 1...
Thanks in advance
答案1
得分: 0
你的CSV中没有制表符。
更改分隔符。
英文:
There is no tab in your CSV.
Change the delimiter.
import pandas as pd
# https://stackoverflow.com/a/19633103/20307768
# '\s+': it says to expect one or more spaces. the matches will be as large as possible.
df1 = pd.read_csv('41586_2022_4496_MOESM3_ESM.txt', delimiter='\s+')
df1.head(2)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论