英文:
Opening textfile in python
问题
数据文件的结构如下:
2.0 0 3 9.15400
5.40189 0.77828 0.66432
0.44219 0.00000
2.0 0 1 9.15400
0.00000
2.0 0 6 9.15400
7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.65828
0.58990 0.00000
我想创建一个数据框,应该如下所示:
9.15400 5.40189 0.77828 0.66432 0.44219 0.00000
9.15400 0.00000
9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.000
9.15400 2.09559 0.77828 0.00000
9.15400 2.09559 0.77828 0.65828 0.58990 0.00000
请问有人可以告诉我如何开始处理这个问题吗?
英文:
The structure of the data file looks like this:
2.0 0 3 9.15400
5.40189 0.77828 0.66432
0.44219 0.00000
2.0 0 1 9.15400
0.00000
2.0 0 6 9.15400
7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.65828
0.58990 0.00000
and so on
I want to create a data frame that should look like this:
9.15400 5.40189 0.77828 0.66432 0.44219 0.00000
9.15400 0.00000
9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.000
9.15400 2.09559 0.77828 0.00000
9.15400 2.09559 0.77828 0.65828 0.58990 0.00000
Can someone please help me how I can get started with this?
答案1
得分: 0
为什么你期望的数据框的第0行不包含0.66432
?
如果表格结构不清晰,尝试以下方法:
df = pd.read_table(filename, header=None)
df = df[0].str.split(" ", expand=True)
new_rows = []
values = []
for row in df.itertuples():
if row[2]:
if values:
new_rows.append(values)
values = []
values.append(row[7])
else:
values.extend(value for value in row[4:] if value)
new_rows.append(values)
new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)
或者,如果每一列的宽度是固定的,只是在你的问题中显示不正确,尝试这个:
pd.options.display.float_format = '{:.5f}'.format
df = pd.read_fwf(filename, header=None).fillna("")
new_rows = []
values = []
for row in df.itertuples():
if row[1] != "":
if values:
new_rows.append(values)
values = []
values.append(row[4])
else:
values.extend(value for value in row[4:] if value != "")
new_rows.append(values)
new_df = pd.DataFrame(new_rows).astype(float).fillna("")
print(new_df)
英文:
Why doesn't the 0th row of your desired dataframe include 0.66432
?
It's unclear how the table is structured. If it is as unstructured as is shown in your question, try this:
Input:
2.0 0 3 9.15400
5.40189 0.77828 0.66432
0.44219 0.00000
2.0 0 1 9.15400
0.00000
2.0 0 6 9.15400
7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.65828
0.58990 0.00000
filename = r"C:\Users\Bobson Dugnutt\Desktop\table2.txt"
# This returns a dataframe with a single column
df = pd.read_table(filename, header=None)
# Split the bad-boy at every double space
df = df[0].str.split(" ", expand=True)
new_rows = []
values = []
for row in df.itertuples():
# row[2] is the column which is either 2.0 or blank ("")
if row[2]:
if values:
new_rows.append(values)
values = []
# row[7] is the column with a value like 9.15400
values.append(row[7])
else:
# Add all non-blank values starting from the the 4th column.
# The 4th column is the first column meaningful values are
# found for these rows
values.extend(value for value in row[4:] if value)
new_rows.append(values)
# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)
Output:
0 1 2 3 4 5 6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000
1 9.15400 0.00000
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000
Or if each column is fixed width, and is just not displayed correctly in your question, try this:
Input:
2.0 0 3 9.15400
5.40189 0.77828 0.66432
0.44219 0.00000
2.0 0 1 9.15400
0.00000
2.0 0 6 9.15400
7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.65828
0.58990 0.00000
import pandas as pd
# Make the floats the same width of your desired output
pd.options.display.float_format = '{:.5f}'.format
filename = r"C:\Users\Bobson Dugnutt\Desktop\table.txt"
df = pd.read_fwf(filename, header=None).fillna("")
new_rows = []
values = []
for row in df.itertuples():
# row[1] is the column which is either 2.0 or blank ("")
if row[1] != "":
if values:
new_rows.append(values)
values = []
# row[4] is the column with a value like 9.15400
values.append(row[4])
else:
# Add all non-blank values starting from the same column as above
values.extend(value for value in row[4:] if value != "")
new_rows.append(values)
# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).astype(float).fillna("")
print(new_df)
Output:
0 1 2 3 4 5 6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000
1 9.15400 0.00000
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000
In both cases all you have to do is iterate over each row, check if one of the first columns is not blank (i.e. is 2.0
), and if it is, then get all the other meaningful values in the next rows until you come across another similar row. The specific index differs depending on how the table was originally parsed.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论