2023年2月23日 22:56:32go评论177阅读模式

英文:

Opening textfile in python

问题

数据文件的结构如下：

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000

我想创建一个数据框，应该如下所示：

9.15400    5.40189    0.77828    0.66432    0.44219    0.00000
9.15400    0.00000
9.15400    7.38451    3.99120    2.23459    1.49781    0.77828    0.000
9.15400    2.09559    0.77828    0.00000
9.15400    2.09559    0.77828    0.65828    0.58990    0.00000

请问有人可以告诉我如何开始处理这个问题吗？

英文:

The structure of the data file looks like this:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000

and so on

I want to create a data frame that should look like this:

9.15400    5.40189    0.77828    0.66432    0.44219    0.00000
9.15400    0.00000
9.15400    7.38451    3.99120    2.23459    1.49781    0.77828    0.000
9.15400    2.09559    0.77828    0.00000
9.15400    2.09559    0.77828    0.65828    0.58990    0.00000

Can someone please help me how I can get started with this?

答案1

得分: 0

为什么你期望的数据框的第0行不包含0.66432？

如果表格结构不清晰，尝试以下方法：

df = pd.read_table(filename, header=None)
df = df[0].str.split("  ", expand=True)

new_rows = []
values = []

for row in df.itertuples():
    if row[2]:
        if values:
            new_rows.append(values)
            values = []

        values.append(row[7])
    else:
        values.extend(value for value in row[4:] if value)

new_rows.append(values)

new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)

或者，如果每一列的宽度是固定的，只是在你的问题中显示不正确，尝试这个：

pd.options.display.float_format = '{:.5f}'.format

df = pd.read_fwf(filename, header=None).fillna("")

new_rows = []
values = []

for row in df.itertuples():
    if row[1] != "":
        if values:
            new_rows.append(values)
            values = []

        values.append(row[4])
    else:
        values.extend(value for value in row[4:] if value != "")

new_rows.append(values)

new_df = pd.DataFrame(new_rows).astype(float).fillna("")
print(new_df)

英文:

Why doesn't the 0th row of your desired dataframe include 0.66432?

It's unclear how the table is structured. If it is as unstructured as is shown in your question, try this:

Input:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000


filename = r&quot;C:\Users\Bobson Dugnutt\Desktop\table2.txt&quot;

# This returns a dataframe with a single column
df = pd.read_table(filename, header=None)

# Split the bad-boy at every double space
df = df[0].str.split(&quot;  &quot;, expand=True)

new_rows = []
values = []

for row in df.itertuples():
    # row[2] is the column which is either 2.0 or blank (&quot;&quot;)
    if row[2]:
        if values:
            new_rows.append(values)
            values = []
        
        # row[7] is the column with a value like 9.15400
        values.append(row[7])
    else:
        # Add all non-blank values starting from the the 4th column.
        # The 4th column is the first column meaningful values are 
        # found for these rows
        values.extend(value for value in row[4:] if value)
    
new_rows.append(values)

# fillna(&quot;&quot;) so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna(&quot;&quot;)
print(new_df)

Output:

        0       1       2       3       4       5       6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000        
1 9.15400 0.00000                                        
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000                        
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000

Or if each column is fixed width, and is just not displayed correctly in your question, try this:

Input:

  2.0  0    3    9.15400
                 5.40189    0.77828    0.66432
                 0.44219    0.00000
  2.0  0    1    9.15400
                 0.00000
  2.0  0    6    9.15400
                 7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
                 2.09559    0.77828    0.00000
  2.0  0    3    9.15400
                 2.09559    0.77828    0.65828
                 0.58990    0.00000

import pandas as pd

# Make the floats the same width of your desired output
pd.options.display.float_format = &#39;{:.5f}&#39;.format

filename = r&quot;C:\Users\Bobson Dugnutt\Desktop\table.txt&quot;

df = pd.read_fwf(filename, header=None).fillna(&quot;&quot;)

new_rows = []
values = []

for row in df.itertuples():
    # row[1] is the column which is either 2.0 or blank (&quot;&quot;)
    if row[1] != &quot;&quot;:
        if values:
            new_rows.append(values)
            values = []
        
        # row[4] is the column with a value like 9.15400
        values.append(row[4])
    else:
        # Add all non-blank values starting from the same column as above
        values.extend(value for value in row[4:] if value != &quot;&quot;)

new_rows.append(values)

# fillna(&quot;&quot;) so make the NaN values blank
new_df = pd.DataFrame(new_rows).astype(float).fillna(&quot;&quot;)

print(new_df)

Output:

        0       1       2       3       4       5       6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000        
1 9.15400 0.00000                                        
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000                        
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000

In both cases all you have to do is iterate over each row, check if one of the first columns is not blank (i.e. is 2.0), and if it is, then get all the other meaningful values in the next rows until you come across another similar row. The specific index differs depending on how the table was originally parsed.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

打开 Python 中的文本文件

问题

答案1

从XML元素中删除空白字符使用ElementTree

Docker python3.8 alpine 安装 python-ldap 失败，缺少 lber.h 文件。

确定一个圆是否可以“逃离”一组点。

创建一个多级列数据透视表在 pandas 中。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论