从文件中获取特定文本。

huangapple go评论153阅读模式
英文:

How to obtain a specific text from a file?

问题

I generated a data file with the following format:

  1. 0.1
  2. Analytic value = 340.347685734
  3. Approximated value = 332.45634555
  4. --
  5. 0.2
  6. Analytic value = 340.936745872
  7. Approximated value = 332.57893789
  8. --
  9. 0.3
  10. ... and so on

I want to plot the analytic and approximate values in matplotlib/gnuplot against the input parameter (0.1, 0.2, etc). Usually, before generating the data file, I use to generate them with an awk script that puts the three values in three columns which is very easy to plot. However, here I accidentally generated the data file in a different format. How can I convert this text file to the following (maybe using regex or awk!):

  1. 0.1 340.347685734 332.45634555
  2. 0.2 340.936745872 332.57893789
  3. 0.3 ... and so on

Or is there a way that I can plot the data without converting the format using gnuplot/matplotlib?

EDIT:
I have attempted to do it using python3. The following is my code:

  1. file = open("myFile.dat","r")
  2. newFile = open("newFile.dat", 'a')
  3. for i in range(4000):
  4. col1 = file.readline().split()[-1]
  5. col2 = file.readline().split()[-1]
  6. col3 = file.readline().split()[-1]
  7. _ = file.readline().split()[-1]
  8. line = col1 + " " + col2 + " " + col3
  9. newFile.write(line)

However, I was getting some error TypeError: 'builtin_function_or_method' object is not subscriptable which I didn't understand and I think this is a very inefficient code. That's why I asked in the SE. All the solutions presented so far work quite well. I marked the solution with awk as the accepted answer because it's simple and elegant. Also, I appreciate the solution that uses gnuplot only which also uncovered a side of gnuplot for me.

英文:

I generated a data file with the following format:

  1. 0.1
  2. Analytic value = 340.347685734
  3. Approximated value = 332.45634555
  4. --
  5. 0.2
  6. Analytic value = 340.936745872
  7. Approximated value = 332.57893789
  8. --
  9. 0.3
  10. ... and so on

I want to plot the analytic and approximate values in matplotlib/gnuplot against the input parameter (0.1, 0.2, etc). Usually, before generating the data file, I use to generate them with an awk script that puts the three values in three columns which is very easy to plot. However, here I accidentally generated the data file in a different format. How can I convert this text file to the following (maybe using regex or awk!):

  1. 0.1 340.347685734 332.45634555
  2. 0.2 340.936745872 332.57893789
  3. 0.3 ... and so on

Or is there a way that I can plot the data without converting the format using gnuplot/matplotlib?

EDIT:
I have attempted to do it using python3. The following is my code:

  1. file = open("myFile.dat",'r')
  2. newFile = open("newFile.dat", 'a')
  3. for i in range(4000):
  4. col1 = file.readline().split[-1]
  5. col2 = file.readline().split[-1]
  6. col3 = file.readline().split[-1]
  7. _ = file.readline().split[-1]
  8. line = col1 + " " + col2 + " " + col3
  9. newFile.write(line)

However, I was getting some error TypeError: 'builtin_function_or_method' object is not subscriptable which I didn't understand and I think this is a very inefficient code. That's why I asked in the SE. All the solutions presented so far work quite well. I marked the solution with awk as the accepted answer because it's simple and elegant. Also, I appreciate the solution that uses gnuplot only which also uncover a side of gnuplot for me.

答案1

得分: 2

  1. 0.1 340.347685734 332.45634555
  2. 0.2 340.936745872 332.57893789
  3. 0.3 ... and so on
英文:

No Regex needed here. Just 4 simple replacements:

Two replacements for the unwanted text, one replacement to remove the line breaks and one replacement to insert a linebreak again.

  1. file = """0.1
  2. Analytic value = 340.347685734
  3. Approximated value = 332.45634555
  4. --
  5. 0.2
  6. Analytic value = 340.936745872
  7. Approximated value = 332.57893789
  8. --
  9. 0.3
  10. ... and so on
  11. """
  12. file = file.replace("Analytic value = ","")
  13. file = file.replace("Approximated value = ","")
  14. file = file.replace("\n"," ")
  15. file = file.replace("-- ","\n")
  16. print(file)

Result:

  1. 0.1 340.347685734 332.45634555
  2. 0.2 340.936745872 332.57893789
  3. 0.3 ... and so on

答案2

得分: 2

我将使用GNU AWK来完成这个任务,如下所示,假设file.txt的内容如下:

  1. 0.1
  2. Analytic value = 340.347685734
  3. Approximated value = 332.45634555
  4. --
  5. 0.2
  6. Analytic value = 340.936745872
  7. Approximated value = 332.57893789
  8. --

然后运行以下代码:

  1. awk '/^--$/{print "";next}{printf "%s ",$NF}' file.txt

将输出:

  1. 0.1 340.347685734 332.45634555
  2. 0.2 340.936745872 332.57893789

解释:对于行为--的情况,只打印换行符并继续下一行,对于其他所有行,输出最后一个字段,后跟空格而不是换行符。如果你想了解更多关于NF的信息,请阅读8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

(在 GNU Awk 5.1.0 中测试通过)

英文:

I would harness GNU AWK for this task following way, let file.txt content be

  1. 0.1
  2. Analytic value = 340.347685734
  3. Approximated value = 332.45634555
  4. --
  5. 0.2
  6. Analytic value = 340.936745872
  7. Approximated value = 332.57893789
  8. --

then

  1. awk '/^--$/{print "";next}{printf "%s ",$NF}' file.txt

doess output

  1. 0.1 340.347685734 332.45634555
  2. 0.2 340.936745872 332.57893789

Explanation: for line being -- just print newline and go to next one, for all others lines do output last field followed by space and not newline. If you want to know more about NF then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

(tested in GNU Awk 5.1.0)

答案3

得分: 1

这个问题有很多解决方法,其中选择的方式会取决于文件大小等因素。以下是一种简单的解决方案,适用于不能一次加载整个文件的情况 - 您需要逐行处理它,

  1. raw_data_file = 'data.txt'
  2. out_data_file = 'data_final.txt'
  3. counter = 0
  4. with open(raw_data_file, 'r') as fin, open(out_data_file, 'w') as fout:
  5. temp_line = []
  6. for line in fin:
  7. if counter == 0:
  8. # 第一列
  9. temp_line.append(line.strip())
  10. counter += 1
  11. continue
  12. elif counter == 1:
  13. # 分析数值列
  14. temp_line.append(line.strip().split()[-1])
  15. counter += 1
  16. continue
  17. elif counter == 2:
  18. # 近似数值列
  19. temp_line.append(line.strip().split()[-1])
  20. counter += 1
  21. elif counter == 3:
  22. # 跳过 -- 并重置计数器
  23. counter = 0
  24. continue
  25. # 将重新排列的数据写入文件
  26. fout.write(' '.join(temp_line))
  27. fout.write('\n')
  28. temp_line = []

请注意,此解决方案密切依赖于您提供的文件结构。

英文:

There are many ways to solve this problem, and the choice will among others depend on the file size. Here is a simple solution for a case when you cannot load the whole file at once - you have to process it line by line,

  1. raw_data_file = 'data.txt'
  2. out_data_file = 'data_final.txt'
  3. counter = 0
  4. with open(raw_data_file, 'r') as fin, open(out_data_file, 'w') as fout:
  5. temp_line = []
  6. for line in fin:
  7. if counter == 0:
  8. # First column
  9. temp_line.append(line.strip())
  10. counter += 1
  11. continue
  12. elif counter == 1:
  13. # Analytic value column
  14. temp_line.append(line.strip().split()[-1])
  15. counter += 1
  16. continue
  17. elif counter == 2:
  18. # Approximate value column
  19. temp_line.append(line.strip().split()[-1])
  20. counter += 1
  21. elif counter == 3:
  22. # Skip the -- and reset the counter
  23. counter = 0
  24. continue
  25. # Write the rearranged data to file
  26. fout.write((' ').join(temp_line))
  27. fout.write('\n')
  28. temp_line = []

Note that this solution relies tightly on the structure of the file that you provided.

答案4

得分: 1

还有没有一种方法可以在不使用gnuplot/matplotlib的情况下绘制数据,而不需要转换格式?

是的,有! 这是一个独立于平台的仅使用gnuplot的解决方案。无需外部额外的数据准备工具。

如果要从文件绘制,请跳过$Data <<EOD ... EOD部分,而是使用plot 'yourFile.dat' ...

脚本:(适用于gnuplot >= 5.0.6,2017年3月)

  1. ### 绘制特殊数据格式
  2. reset session
  3. $Data <<EOD
  4. 0.1
  5. Analytic value = 340.347685734
  6. Approximated value = 332.45634555
  7. --
  8. 0.2
  9. Analytic value = 340.936745872
  10. Approximated value = 332.57893789
  11. --
  12. 0.3
  13. Analytic value = 341.936745872
  14. Approximated value = 333.57893789
  15. EOD
  16. set datafile missing NaN
  17. set key out
  18. myFilter(colD,colF,valF) = strcol(colF) eq valF ? column(colD) : NaN
  19. plot $Data u (valid(1)?x0=$1:x0):(myFilter(4,1,"Analytic")) w lp pt 7 lc "red" ti "analytic", \
  20. '' u (valid(1)?x0=$1:x0):(myFilter(4,1,"Approximated")) w lp pt 7 lc "blue" ti "approximated"
  21. ### 脚本结束

结果:

从文件中获取特定文本。

英文:

> Or is there a way that I can plot the data without converting the format using gnuplot/matplotlib?

Yes, there is! Here is a platform-independent gnuplot-only solution. No need for external extra data preparation tools.

If you are plotting from a file, skip the $Data &lt;&lt;EOD ... EOD section and use plot &#39;yourFile.dat&#39; ... .

Script: (works for gnuplot>=5.0.6, March 2017)

  1. ### plot special data format
  2. reset session
  3. $Data &lt;&lt;EOD
  4. 0.1
  5. Analytic value = 340.347685734
  6. Approximated value = 332.45634555
  7. --
  8. 0.2
  9. Analytic value = 340.936745872
  10. Approximated value = 332.57893789
  11. --
  12. 0.3
  13. Analytic value = 341.936745872
  14. Approximated value = 333.57893789
  15. EOD
  16. set datafile missing NaN
  17. set key out
  18. myFilter(colD,colF,valF) = strcol(colF) eq valF ? column(colD) : NaN
  19. plot $Data u (valid(1)?x0=$1:x0):(myFilter(4,1,&quot;Analytic&quot;)) w lp pt 7 lc &quot;red&quot; ti &quot;analytic&quot;, \
  20. &#39;&#39; u (valid(1)?x0=$1:x0):(myFilter(4,1,&quot;Approximated&quot;)) w lp pt 7 lc &quot;blue&quot; ti &quot;approximated&quot;
  21. ### end of script

Result:

从文件中获取特定文本。

答案5

得分: 1

Using any awk:

  1. $ awk '{n=(NR%4); val[n]=$NF} n==0{print val[1], val[2], val[3]}' file
  2. 0.1 340.347685734 332.45634555
  3. 0.2 340.936745872 332.57893789
英文:

Using any awk:

  1. $ awk &#39;{n=(NR%4); val[n]=$NF} n==0{print val[1], val[2], val[3]}&#39; file
  2. 0.1 340.347685734 332.45634555
  3. 0.2 340.936745872 332.57893789

huangapple
  • 本文由 发表于 2023年6月8日 00:47:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76425507.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定