2023年7月3日 02:11:28go评论99阅读模式

英文:

Reading a complex, large text file

问题

我有一个非常大的文本文件，我正在尝试加载到jupyternotebook中进行分析等等。

但是我似乎找不到分隔列的方法？到目前为止，我只有在处理相对容易掌握的hdf5和csv文件的经验。

我将在下面附上数据的链接：

https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-022-04496-5/MediaObjects/41586_2022_4496_MOESM3_ESM.txt

df1 = pd.read_csv('41586_2022_4496_MOESM3_ESM.txt', delimiter='\t')
print(df1.head(2))

结果

       1    331.581577     -1.512106  17.774   2.143  -0.828   0.132     104.93    1092.57      45.54     7.355     1.359    -1.468     267695571003410291                   20111024-F5902-01-061    26.9  5520.3    40.0    3.951    0.116    1.581    0.430    2.296    0.188    0.339    0.041
0       2    332.300352     -1.566708   6.780   0...
1       3    331.985497     -1.371940  18.426   1...

提前感谢

英文:

I have a very large text file which I am trying to load into jupyternotebook to perform analysis and etc..

But I can't seem to find a way to separate the columns? Thus far I have only had experience in working with hdf5 and csv files which are relatively easy to get a hang of.

I will attach a link to the data below,

https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-022-04496-5/MediaObjects/41586_2022_4496_MOESM3_ESM.txt

df1 = pd.read_csv(&#39;41586_2022_4496_MOESM3_ESM.txt&#39;, delimiter=&#39;\t&#39;)
print(df1.head(2))

result

       1    331.581577     -1.512106  17.774   2.143  -0.828   0.132     104.93    1092.57      45.54     7.355     1.359    -1.468     267695571003410291                   20111024-F5902-01-061    26.9  5520.3    40.0    3.951    0.116    1.581    0.430    2.296    0.188    0.339    0.041
0       2    332.300352     -1.566708   6.780   0...                                                                                                                                                                                                                                              
1       3    331.985497     -1.371940  18.426   1...

Thanks in advance

答案1

得分: 0

你的CSV中没有制表符。
更改分隔符。

英文:

There is no tab in your CSV.
Change the delimiter.

import pandas as pd
# https://stackoverflow.com/a/19633103/20307768
# &#39;\s+&#39;: it says to expect one or more spaces. the matches will be as large as possible.
df1 = pd.read_csv(&#39;41586_2022_4496_MOESM3_ESM.txt&#39;, delimiter=&#39;\s+&#39;)
df1.head(2)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Reading a complex, large text file.

问题

答案1

如何在Kafka消费者中移动到特定偏移量，而不会遇到ValueError？

基于数组递归修改现有字典

Mypy 在早期返回前的实例检查后引发联合属性错误。

不一致的Flask会话

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。