2023年3月9日 15:08:41go评论184阅读模式

英文:

Turn data in file txt to DataFrame

问题

我有一个问题。我有一个像这样的data.txt文件。

把txt文件中的数据转成DataFrame。

我想将它转换成一个像这样的数据框。

把txt文件中的数据转成DataFrame。。

我该怎么做呢？我尝试了正则表达式，但不起作用。
如果有文件的话，可以在这里找到文件。

英文:

I have a problem here. I have a data.txt file like this

把txt文件中的数据转成DataFrame。

I want to convert it into a dataframe like this

把txt文件中的数据转成DataFrame。 .

How can I do it. I tried the regex but it doesn't work.
Here if file
<a href="https://www.dropbox.com/home?preview=xyraw.txt"> file </a>

答案1

得分: 1

这是执行任务的代码。请注意，我是从字符串中读取数据而不是从文件中读取，但你可以轻松地进行交换。

data_string = &quot;&quot;&quot;
15_9CANAL
15_9
         0
     0.000     1.190 &lt;1&gt;
     5.000     1.100
    10.000     1.160
    15.000     1.190
    16.000     1.100
    17.000     1.140
    25.000    -0.850
    30.000    -1.650
    35.000    -1.850 &lt;2&gt;
    40.000    -1.550
    45.000    -0.850
    48.000     1.140
    50.000     1.230
    52.000     1.230 &lt;3&gt;
15_9CANAL
15_9
      1500
NoData
15_9CANAL
15_9
      3000
     0.000     1.370 &lt;1&gt;
     5.000     1.420
    10.000     1.310
    15.000     1.390
    16.000     0.360
    17.000    -0.440
    25.000    -0.940
    30.000    -1.440
    35.000    -1.640 &lt;2&gt;
    40.000    -1.040
    45.000    -0.740
    48.000     0.360
    50.000     1.270
    52.000     1.430
    53.000     1.430 &lt;3&gt;
&quot;&quot;&quot;
from io import StringIO
import pandas as pd
data = []
data_row = None
skip_row = False
chainage = False
for row in StringIO(data_string):
    if row.strip() == &#39;&#39;:
        continue
    if skip_row:
        skip_row = False
        chainage = True
        continue
    elif chainage:
        chainage = False
        data_row[&#39;Chainage&#39;] = row.strip()
    elif &#39;CANAL&#39; in row:
        if data_row is not None and data_row != {}:
            data.append(data_row)
        data_row = {}
        data_row[&#39;River&#39;] = row.strip()
        skip_row = True
    else:
        # read data
        print(&#39;row&#39;, row)
        values = [val for val in row.split(&#39; &#39;) if val != &#39;&#39;]
        if values[0].strip() != &#39;NoData&#39;:
            data_row[&#39;x&#39;] = values[0].strip()
            data_row[&#39;y&#39;] = values[1].strip()
            data.append(data_row)
            data_row = {}
if data_row is not None and data_row != {}:
    data.append(data_row)
pd.DataFrame(data)

这将生成你要求的数据框架。

如果你希望在所有行中都包含运河信息，只需注释掉以下部分：

if data_row is not None:
    data.append(data_row)

这部分不会进行翻译，因为你要求不翻译代码部分。

英文:

Here is the code that does the trick. Note, that I read from string instead of from file, but you can easily exchange that.

data_string = &quot;&quot;&quot;
15_9CANAL
15_9
0
0.000     1.190 &lt;1&gt;
5.000     1.100
10.000     1.160
15.000     1.190
16.000     1.100
17.000     1.140
25.000    -0.850
30.000    -1.650
35.000    -1.850 &lt;2&gt;
40.000    -1.550
45.000    -0.850
48.000     1.140
50.000     1.230
52.000     1.230 &lt;3&gt;
15_9CANAL
15_9
1500
NoData
15_9CANAL
15_9
3000
0.000     1.370 &lt;1&gt;
5.000     1.420
10.000     1.310
15.000     1.390
16.000     0.360
17.000    -0.440
25.000    -0.940
30.000    -1.440
35.000    -1.640 &lt;2&gt;
40.000    -1.040
45.000    -0.740
48.000     0.360
50.000     1.270
52.000     1.430
53.000     1.430 &lt;3&gt;
&quot;&quot;&quot;
from io import StringIO
import pandas as pd
data = []
data_row = None
skip_row = False
chainage = False
for row in StringIO(data_string):
if row.strip() == &#39;&#39;:
continue
if skip_row:
skip_row = False
chainage = True
continue
elif chainage:
chainage = False
data_row[&#39;Chainage&#39;] = row.strip()
elif &#39;CANAL&#39; in row:
if data_row is not None and data_row != {}:
data.append(data_row)
data_row = {}
data_row[&#39;River&#39;] = row.strip()
skip_row = True
else:
# read data
print(&#39;row&#39;, row)
values = [val for val in row.split(&#39; &#39;) if val != &#39;&#39;]
if values[0].strip() != &#39;NoData&#39;:
data_row[&#39;x&#39;] = values[0].strip()
data_row[&#39;y&#39;] = values[1].strip()
data.append(data_row)
data_row = {}
if data_row is not None and data_row != {}:
data.append(data_row)
pd.DataFrame(data)

This results in the dataframe you are asking for.

if you want to have the canal information in all rows, just comment out the

if data_row is not None:
data.append(data_row)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

把txt文件中的数据转成DataFrame。

问题

答案1

如何在Python中模仿Golang的make()函数？

Django – 登录后的模板不知道用户是否已验证身份。

将”groupby”的结果展开/扩展，并保持与”groupby”之前相同的排序/索引。

Pandas: 如何将两列连接为多行字符串？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论