把txt文件中的数据转成DataFrame。

huangapple go评论184阅读模式
英文:

Turn data in file txt to DataFrame

问题

我有一个问题。我有一个像这样的data.txt文件。

把txt文件中的数据转成DataFrame。

我想将它转换成一个像这样的数据框。

把txt文件中的数据转成DataFrame。

我该怎么做呢?我尝试了正则表达式,但不起作用。
如果有文件的话,可以在这里找到文件。

英文:

I have a problem here. I have a data.txt file like this

把txt文件中的数据转成DataFrame。

I want to convert it into a dataframe like this

把txt文件中的数据转成DataFrame。.

How can I do it. I tried the regex but it doesn't work.
Here if file
<a href="https://www.dropbox.com/home?preview=xyraw.txt"> file </a>

答案1

得分: 1

这是执行任务的代码。请注意,我是从字符串中读取数据而不是从文件中读取,但你可以轻松地进行交换。

  1. data_string = &quot;&quot;&quot;
  2. 15_9CANAL
  3. 15_9
  4. 0
  5. 0.000 1.190 &lt;1&gt;
  6. 5.000 1.100
  7. 10.000 1.160
  8. 15.000 1.190
  9. 16.000 1.100
  10. 17.000 1.140
  11. 25.000 -0.850
  12. 30.000 -1.650
  13. 35.000 -1.850 &lt;2&gt;
  14. 40.000 -1.550
  15. 45.000 -0.850
  16. 48.000 1.140
  17. 50.000 1.230
  18. 52.000 1.230 &lt;3&gt;
  19. 15_9CANAL
  20. 15_9
  21. 1500
  22. NoData
  23. 15_9CANAL
  24. 15_9
  25. 3000
  26. 0.000 1.370 &lt;1&gt;
  27. 5.000 1.420
  28. 10.000 1.310
  29. 15.000 1.390
  30. 16.000 0.360
  31. 17.000 -0.440
  32. 25.000 -0.940
  33. 30.000 -1.440
  34. 35.000 -1.640 &lt;2&gt;
  35. 40.000 -1.040
  36. 45.000 -0.740
  37. 48.000 0.360
  38. 50.000 1.270
  39. 52.000 1.430
  40. 53.000 1.430 &lt;3&gt;
  41. &quot;&quot;&quot;
  42. from io import StringIO
  43. import pandas as pd
  44. data = []
  45. data_row = None
  46. skip_row = False
  47. chainage = False
  48. for row in StringIO(data_string):
  49. if row.strip() == &#39;&#39;:
  50. continue
  51. if skip_row:
  52. skip_row = False
  53. chainage = True
  54. continue
  55. elif chainage:
  56. chainage = False
  57. data_row[&#39;Chainage&#39;] = row.strip()
  58. elif &#39;CANAL&#39; in row:
  59. if data_row is not None and data_row != {}:
  60. data.append(data_row)
  61. data_row = {}
  62. data_row[&#39;River&#39;] = row.strip()
  63. skip_row = True
  64. else:
  65. # read data
  66. print(&#39;row&#39;, row)
  67. values = [val for val in row.split(&#39; &#39;) if val != &#39;&#39;]
  68. if values[0].strip() != &#39;NoData&#39;:
  69. data_row[&#39;x&#39;] = values[0].strip()
  70. data_row[&#39;y&#39;] = values[1].strip()
  71. data.append(data_row)
  72. data_row = {}
  73. if data_row is not None and data_row != {}:
  74. data.append(data_row)
  75. pd.DataFrame(data)

这将生成你要求的数据框架。

如果你希望在所有行中都包含运河信息,只需注释掉以下部分:

  1. if data_row is not None:
  2. data.append(data_row)

这部分不会进行翻译,因为你要求不翻译代码部分。

英文:

Here is the code that does the trick. Note, that I read from string instead of from file, but you can easily exchange that.

  1. data_string = &quot;&quot;&quot;
  2. 15_9CANAL
  3. 15_9
  4. 0
  5. 0.000 1.190 &lt;1&gt;
  6. 5.000 1.100
  7. 10.000 1.160
  8. 15.000 1.190
  9. 16.000 1.100
  10. 17.000 1.140
  11. 25.000 -0.850
  12. 30.000 -1.650
  13. 35.000 -1.850 &lt;2&gt;
  14. 40.000 -1.550
  15. 45.000 -0.850
  16. 48.000 1.140
  17. 50.000 1.230
  18. 52.000 1.230 &lt;3&gt;
  19. 15_9CANAL
  20. 15_9
  21. 1500
  22. NoData
  23. 15_9CANAL
  24. 15_9
  25. 3000
  26. 0.000 1.370 &lt;1&gt;
  27. 5.000 1.420
  28. 10.000 1.310
  29. 15.000 1.390
  30. 16.000 0.360
  31. 17.000 -0.440
  32. 25.000 -0.940
  33. 30.000 -1.440
  34. 35.000 -1.640 &lt;2&gt;
  35. 40.000 -1.040
  36. 45.000 -0.740
  37. 48.000 0.360
  38. 50.000 1.270
  39. 52.000 1.430
  40. 53.000 1.430 &lt;3&gt;
  41. &quot;&quot;&quot;
  42. from io import StringIO
  43. import pandas as pd
  44. data = []
  45. data_row = None
  46. skip_row = False
  47. chainage = False
  48. for row in StringIO(data_string):
  49. if row.strip() == &#39;&#39;:
  50. continue
  51. if skip_row:
  52. skip_row = False
  53. chainage = True
  54. continue
  55. elif chainage:
  56. chainage = False
  57. data_row[&#39;Chainage&#39;] = row.strip()
  58. elif &#39;CANAL&#39; in row:
  59. if data_row is not None and data_row != {}:
  60. data.append(data_row)
  61. data_row = {}
  62. data_row[&#39;River&#39;] = row.strip()
  63. skip_row = True
  64. else:
  65. # read data
  66. print(&#39;row&#39;, row)
  67. values = [val for val in row.split(&#39; &#39;) if val != &#39;&#39;]
  68. if values[0].strip() != &#39;NoData&#39;:
  69. data_row[&#39;x&#39;] = values[0].strip()
  70. data_row[&#39;y&#39;] = values[1].strip()
  71. data.append(data_row)
  72. data_row = {}
  73. if data_row is not None and data_row != {}:
  74. data.append(data_row)
  75. pd.DataFrame(data)

This results in the dataframe you are asking for.

if you want to have the canal information in all rows, just comment out the

  1. if data_row is not None:
  2. data.append(data_row)

huangapple
  • 本文由 发表于 2023年3月9日 15:08:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75681416.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定