英文:
Turn data in file txt to DataFrame
问题
我有一个问题。我有一个像这样的data.txt文件。

我想将它转换成一个像这样的数据框。
。
我该怎么做呢?我尝试了正则表达式,但不起作用。
如果有文件的话,可以在这里找到文件。
英文:
I have a problem here. I have a data.txt file like this

I want to convert it into a dataframe like this
.
How can I do it. I tried the regex but it doesn't work.
Here if file
<a href="https://www.dropbox.com/home?preview=xyraw.txt"> file </a>
答案1
得分: 1
这是执行任务的代码。请注意,我是从字符串中读取数据而不是从文件中读取,但你可以轻松地进行交换。
data_string = """
15_9CANAL
15_9
         0
     0.000     1.190 <1>
     5.000     1.100
    10.000     1.160
    15.000     1.190
    16.000     1.100
    17.000     1.140
    25.000    -0.850
    30.000    -1.650
    35.000    -1.850 <2>
    40.000    -1.550
    45.000    -0.850
    48.000     1.140
    50.000     1.230
    52.000     1.230 <3>
15_9CANAL
15_9
      1500
NoData
15_9CANAL
15_9
      3000
     0.000     1.370 <1>
     5.000     1.420
    10.000     1.310
    15.000     1.390
    16.000     0.360
    17.000    -0.440
    25.000    -0.940
    30.000    -1.440
    35.000    -1.640 <2>
    40.000    -1.040
    45.000    -0.740
    48.000     0.360
    50.000     1.270
    52.000     1.430
    53.000     1.430 <3>
"""
from io import StringIO
import pandas as pd
data = []
data_row = None
skip_row = False
chainage = False
for row in StringIO(data_string):
    if row.strip() == '':
        continue
    if skip_row:
        skip_row = False
        chainage = True
        continue
    elif chainage:
        chainage = False
        data_row['Chainage'] = row.strip()
    elif 'CANAL' in row:
        if data_row is not None and data_row != {}:
            data.append(data_row)
        data_row = {}
        data_row['River'] = row.strip()
        skip_row = True
    else:
        # read data
        print('row', row)
        values = [val for val in row.split(' ') if val != '']
        if values[0].strip() != 'NoData':
            data_row['x'] = values[0].strip()
            data_row['y'] = values[1].strip()
            data.append(data_row)
            data_row = {}
if data_row is not None and data_row != {}:
    data.append(data_row)
pd.DataFrame(data)
这将生成你要求的数据框架。
如果你希望在所有行中都包含运河信息,只需注释掉以下部分:
if data_row is not None:
    data.append(data_row)
这部分不会进行翻译,因为你要求不翻译代码部分。
英文:
Here is the code that does the trick. Note, that I read from string instead of from file, but you can easily exchange that.
data_string = """
15_9CANAL
15_9
0
0.000     1.190 <1>
5.000     1.100
10.000     1.160
15.000     1.190
16.000     1.100
17.000     1.140
25.000    -0.850
30.000    -1.650
35.000    -1.850 <2>
40.000    -1.550
45.000    -0.850
48.000     1.140
50.000     1.230
52.000     1.230 <3>
15_9CANAL
15_9
1500
NoData
15_9CANAL
15_9
3000
0.000     1.370 <1>
5.000     1.420
10.000     1.310
15.000     1.390
16.000     0.360
17.000    -0.440
25.000    -0.940
30.000    -1.440
35.000    -1.640 <2>
40.000    -1.040
45.000    -0.740
48.000     0.360
50.000     1.270
52.000     1.430
53.000     1.430 <3>
"""
from io import StringIO
import pandas as pd
data = []
data_row = None
skip_row = False
chainage = False
for row in StringIO(data_string):
if row.strip() == '':
continue
if skip_row:
skip_row = False
chainage = True
continue
elif chainage:
chainage = False
data_row['Chainage'] = row.strip()
elif 'CANAL' in row:
if data_row is not None and data_row != {}:
data.append(data_row)
data_row = {}
data_row['River'] = row.strip()
skip_row = True
else:
# read data
print('row', row)
values = [val for val in row.split(' ') if val != '']
if values[0].strip() != 'NoData':
data_row['x'] = values[0].strip()
data_row['y'] = values[1].strip()
data.append(data_row)
data_row = {}
if data_row is not None and data_row != {}:
data.append(data_row)
pd.DataFrame(data)
This results in the dataframe you are asking for.
if you want to have the canal information in all rows, just comment out the
if data_row is not None:
data.append(data_row)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论