英文:
Turn data in file txt to DataFrame
问题
我有一个问题。我有一个像这样的data.txt文件。
我想将它转换成一个像这样的数据框。
。
我该怎么做呢?我尝试了正则表达式,但不起作用。
如果有文件的话,可以在这里找到文件。
英文:
I have a problem here. I have a data.txt file like this
I want to convert it into a dataframe like this
.
How can I do it. I tried the regex but it doesn't work.
Here if file
<a href="https://www.dropbox.com/home?preview=xyraw.txt"> file </a>
答案1
得分: 1
这是执行任务的代码。请注意,我是从字符串中读取数据而不是从文件中读取,但你可以轻松地进行交换。
data_string = """
15_9CANAL
15_9
0
0.000 1.190 <1>
5.000 1.100
10.000 1.160
15.000 1.190
16.000 1.100
17.000 1.140
25.000 -0.850
30.000 -1.650
35.000 -1.850 <2>
40.000 -1.550
45.000 -0.850
48.000 1.140
50.000 1.230
52.000 1.230 <3>
15_9CANAL
15_9
1500
NoData
15_9CANAL
15_9
3000
0.000 1.370 <1>
5.000 1.420
10.000 1.310
15.000 1.390
16.000 0.360
17.000 -0.440
25.000 -0.940
30.000 -1.440
35.000 -1.640 <2>
40.000 -1.040
45.000 -0.740
48.000 0.360
50.000 1.270
52.000 1.430
53.000 1.430 <3>
"""
from io import StringIO
import pandas as pd
data = []
data_row = None
skip_row = False
chainage = False
for row in StringIO(data_string):
if row.strip() == '':
continue
if skip_row:
skip_row = False
chainage = True
continue
elif chainage:
chainage = False
data_row['Chainage'] = row.strip()
elif 'CANAL' in row:
if data_row is not None and data_row != {}:
data.append(data_row)
data_row = {}
data_row['River'] = row.strip()
skip_row = True
else:
# read data
print('row', row)
values = [val for val in row.split(' ') if val != '']
if values[0].strip() != 'NoData':
data_row['x'] = values[0].strip()
data_row['y'] = values[1].strip()
data.append(data_row)
data_row = {}
if data_row is not None and data_row != {}:
data.append(data_row)
pd.DataFrame(data)
这将生成你要求的数据框架。
如果你希望在所有行中都包含运河信息,只需注释掉以下部分:
if data_row is not None:
data.append(data_row)
这部分不会进行翻译,因为你要求不翻译代码部分。
英文:
Here is the code that does the trick. Note, that I read from string instead of from file, but you can easily exchange that.
data_string = """
15_9CANAL
15_9
0
0.000 1.190 <1>
5.000 1.100
10.000 1.160
15.000 1.190
16.000 1.100
17.000 1.140
25.000 -0.850
30.000 -1.650
35.000 -1.850 <2>
40.000 -1.550
45.000 -0.850
48.000 1.140
50.000 1.230
52.000 1.230 <3>
15_9CANAL
15_9
1500
NoData
15_9CANAL
15_9
3000
0.000 1.370 <1>
5.000 1.420
10.000 1.310
15.000 1.390
16.000 0.360
17.000 -0.440
25.000 -0.940
30.000 -1.440
35.000 -1.640 <2>
40.000 -1.040
45.000 -0.740
48.000 0.360
50.000 1.270
52.000 1.430
53.000 1.430 <3>
"""
from io import StringIO
import pandas as pd
data = []
data_row = None
skip_row = False
chainage = False
for row in StringIO(data_string):
if row.strip() == '':
continue
if skip_row:
skip_row = False
chainage = True
continue
elif chainage:
chainage = False
data_row['Chainage'] = row.strip()
elif 'CANAL' in row:
if data_row is not None and data_row != {}:
data.append(data_row)
data_row = {}
data_row['River'] = row.strip()
skip_row = True
else:
# read data
print('row', row)
values = [val for val in row.split(' ') if val != '']
if values[0].strip() != 'NoData':
data_row['x'] = values[0].strip()
data_row['y'] = values[1].strip()
data.append(data_row)
data_row = {}
if data_row is not None and data_row != {}:
data.append(data_row)
pd.DataFrame(data)
This results in the dataframe you are asking for.
if you want to have the canal information in all rows, just comment out the
if data_row is not None:
data.append(data_row)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论