打开 Python 中的文本文件

huangapple go评论58阅读模式
英文:

Opening textfile in python

问题

数据文件的结构如下:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000

我想创建一个数据框,应该如下所示:

9.15400    5.40189    0.77828    0.66432    0.44219    0.00000
9.15400    0.00000
9.15400    7.38451    3.99120    2.23459    1.49781    0.77828    0.000
9.15400    2.09559    0.77828    0.00000
9.15400    2.09559    0.77828    0.65828    0.58990    0.00000

请问有人可以告诉我如何开始处理这个问题吗?

英文:

The structure of the data file looks like this:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000

and so on

I want to create a data frame that should look like this:

9.15400    5.40189    0.77828    0.66432    0.44219    0.00000
9.15400    0.00000
9.15400    7.38451    3.99120    2.23459    1.49781    0.77828    0.000
9.15400    2.09559    0.77828    0.00000
9.15400    2.09559    0.77828    0.65828    0.58990    0.00000

Can someone please help me how I can get started with this?

答案1

得分: 0

为什么你期望的数据框的第0行不包含0.66432

如果表格结构不清晰,尝试以下方法:

df = pd.read_table(filename, header=None)
df = df[0].str.split("  ", expand=True)

new_rows = []
values = []

for row in df.itertuples():
    if row[2]:
        if values:
            new_rows.append(values)
            values = []

        values.append(row[7])
    else:
        values.extend(value for value in row[4:] if value)

new_rows.append(values)

new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)

或者,如果每一列的宽度是固定的,只是在你的问题中显示不正确,尝试这个:

pd.options.display.float_format = '{:.5f}'.format

df = pd.read_fwf(filename, header=None).fillna("")

new_rows = []
values = []

for row in df.itertuples():
    if row[1] != "":
        if values:
            new_rows.append(values)
            values = []

        values.append(row[4])
    else:
        values.extend(value for value in row[4:] if value != "")

new_rows.append(values)

new_df = pd.DataFrame(new_rows).astype(float).fillna("")
print(new_df)
英文:

Why doesn't the 0th row of your desired dataframe include 0.66432?

It's unclear how the table is structured. If it is as unstructured as is shown in your question, try this:

Input:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000

filename = r"C:\Users\Bobson Dugnutt\Desktop\table2.txt"

# This returns a dataframe with a single column
df = pd.read_table(filename, header=None)

# Split the bad-boy at every double space
df = df[0].str.split("  ", expand=True)

new_rows = []
values = []

for row in df.itertuples():
    # row[2] is the column which is either 2.0 or blank ("")
    if row[2]:
        if values:
            new_rows.append(values)
            values = []
        
        # row[7] is the column with a value like 9.15400
        values.append(row[7])
    else:
        # Add all non-blank values starting from the the 4th column.
        # The 4th column is the first column meaningful values are 
        # found for these rows
        values.extend(value for value in row[4:] if value)
    
new_rows.append(values)

# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)

Output:

        0       1       2       3       4       5       6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000        
1 9.15400 0.00000                                        
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000                        
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000             

Or if each column is fixed width, and is just not displayed correctly in your question, try this:

Input:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000
import pandas as pd

# Make the floats the same width of your desired output
pd.options.display.float_format = '{:.5f}'.format

filename = r"C:\Users\Bobson Dugnutt\Desktop\table.txt"

df = pd.read_fwf(filename, header=None).fillna("")

new_rows = []
values = []

for row in df.itertuples():
    # row[1] is the column which is either 2.0 or blank ("")
    if row[1] != "":
        if values:
            new_rows.append(values)
            values = []
        
        # row[4] is the column with a value like 9.15400
        values.append(row[4])
    else:
        # Add all non-blank values starting from the same column as above
        values.extend(value for value in row[4:] if value != "")

new_rows.append(values)

# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).astype(float).fillna("")

print(new_df)

Output:

        0       1       2       3       4       5       6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000        
1 9.15400 0.00000                                        
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000                        
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000        

In both cases all you have to do is iterate over each row, check if one of the first columns is not blank (i.e. is 2.0), and if it is, then get all the other meaningful values in the next rows until you come across another similar row. The specific index differs depending on how the table was originally parsed.

huangapple
  • 本文由 发表于 2023年2月23日 22:56:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75546517.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定