从文件中获取特定文本。

huangapple go评论86阅读模式
英文:

How to obtain a specific text from a file?

问题

I generated a data file with the following format:

0.1
Analytic value = 340.347685734
Approximated value = 332.45634555
--
0.2
Analytic value = 340.936745872
Approximated value = 332.57893789
--
0.3
... and so on

I want to plot the analytic and approximate values in matplotlib/gnuplot against the input parameter (0.1, 0.2, etc). Usually, before generating the data file, I use to generate them with an awk script that puts the three values in three columns which is very easy to plot. However, here I accidentally generated the data file in a different format. How can I convert this text file to the following (maybe using regex or awk!):

0.1 340.347685734 332.45634555 
0.2 340.936745872 332.57893789
0.3 ... and so on

Or is there a way that I can plot the data without converting the format using gnuplot/matplotlib?

EDIT:
I have attempted to do it using python3. The following is my code:

file = open("myFile.dat","r")
newFile = open("newFile.dat", 'a')
for i in range(4000):
  col1 = file.readline().split()[-1]
  col2 = file.readline().split()[-1]
  col3 = file.readline().split()[-1]
  _ = file.readline().split()[-1]
  line = col1 + " " + col2 + " " + col3
  newFile.write(line)

However, I was getting some error TypeError: 'builtin_function_or_method' object is not subscriptable which I didn't understand and I think this is a very inefficient code. That's why I asked in the SE. All the solutions presented so far work quite well. I marked the solution with awk as the accepted answer because it's simple and elegant. Also, I appreciate the solution that uses gnuplot only which also uncovered a side of gnuplot for me.

英文:

I generated a data file with the following format:

0.1
Analytic value = 340.347685734
Approximated value = 332.45634555
--
0.2
Analytic value = 340.936745872
Approximated value = 332.57893789
--
0.3
... and so on

I want to plot the analytic and approximate values in matplotlib/gnuplot against the input parameter (0.1, 0.2, etc). Usually, before generating the data file, I use to generate them with an awk script that puts the three values in three columns which is very easy to plot. However, here I accidentally generated the data file in a different format. How can I convert this text file to the following (maybe using regex or awk!):

0.1 340.347685734 332.45634555 
0.2 340.936745872 332.57893789
0.3 ... and so on

Or is there a way that I can plot the data without converting the format using gnuplot/matplotlib?

EDIT:
I have attempted to do it using python3. The following is my code:

file = open("myFile.dat",'r')
newFile = open("newFile.dat", 'a')
for i in range(4000):
  col1 = file.readline().split[-1]
  col2 = file.readline().split[-1]
  col3 = file.readline().split[-1]
  _ = file.readline().split[-1]
  line = col1 + " " + col2 + " " + col3
  newFile.write(line)

However, I was getting some error TypeError: 'builtin_function_or_method' object is not subscriptable which I didn't understand and I think this is a very inefficient code. That's why I asked in the SE. All the solutions presented so far work quite well. I marked the solution with awk as the accepted answer because it's simple and elegant. Also, I appreciate the solution that uses gnuplot only which also uncover a side of gnuplot for me.

答案1

得分: 2

0.1 340.347685734 332.45634555 
0.2 340.936745872 332.57893789 
0.3 ... and so on 
英文:

No Regex needed here. Just 4 simple replacements:

Two replacements for the unwanted text, one replacement to remove the line breaks and one replacement to insert a linebreak again.

file = """0.1
Analytic value = 340.347685734
Approximated value = 332.45634555
--
0.2
Analytic value = 340.936745872
Approximated value = 332.57893789
--
0.3
... and so on
"""

file = file.replace("Analytic value = ","")
file = file.replace("Approximated value = ","")
file = file.replace("\n"," ")
file = file.replace("-- ","\n")
print(file)

Result:

0.1 340.347685734 332.45634555 
0.2 340.936745872 332.57893789 
0.3 ... and so on 

答案2

得分: 2

我将使用GNU AWK来完成这个任务,如下所示,假设file.txt的内容如下:

0.1
Analytic value = 340.347685734
Approximated value = 332.45634555
--
0.2
Analytic value = 340.936745872
Approximated value = 332.57893789
--

然后运行以下代码:

awk '/^--$/{print "";next}{printf "%s ",$NF}' file.txt

将输出:

0.1 340.347685734 332.45634555 
0.2 340.936745872 332.57893789

解释:对于行为--的情况,只打印换行符并继续下一行,对于其他所有行,输出最后一个字段,后跟空格而不是换行符。如果你想了解更多关于NF的信息,请阅读8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

(在 GNU Awk 5.1.0 中测试通过)

英文:

I would harness GNU AWK for this task following way, let file.txt content be

0.1
Analytic value = 340.347685734
Approximated value = 332.45634555
--
0.2
Analytic value = 340.936745872
Approximated value = 332.57893789
--

then

awk '/^--$/{print "";next}{printf "%s ",$NF}' file.txt

doess output

0.1 340.347685734 332.45634555 
0.2 340.936745872 332.57893789

Explanation: for line being -- just print newline and go to next one, for all others lines do output last field followed by space and not newline. If you want to know more about NF then read 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

(tested in GNU Awk 5.1.0)

答案3

得分: 1

这个问题有很多解决方法,其中选择的方式会取决于文件大小等因素。以下是一种简单的解决方案,适用于不能一次加载整个文件的情况 - 您需要逐行处理它,

raw_data_file = 'data.txt'
out_data_file = 'data_final.txt'

counter = 0
with open(raw_data_file, 'r') as fin, open(out_data_file, 'w') as fout:
    temp_line = []
    for line in fin:

        if counter == 0:
            # 第一列
            temp_line.append(line.strip())
            counter += 1
            continue
        elif counter == 1:
            # 分析数值列
            temp_line.append(line.strip().split()[-1])
            counter += 1
            continue
        elif counter == 2:
            # 近似数值列
            temp_line.append(line.strip().split()[-1])
            counter += 1
        elif counter == 3:
            # 跳过 -- 并重置计数器
            counter = 0
            continue

        # 将重新排列的数据写入文件
        fout.write(' '.join(temp_line))
        fout.write('\n')
        temp_line = []

请注意,此解决方案密切依赖于您提供的文件结构。

英文:

There are many ways to solve this problem, and the choice will among others depend on the file size. Here is a simple solution for a case when you cannot load the whole file at once - you have to process it line by line,

raw_data_file = 'data.txt'
out_data_file = 'data_final.txt'

counter = 0
with open(raw_data_file, 'r') as fin, open(out_data_file, 'w') as fout:
    temp_line = []
    for line in fin:

        if counter == 0:
            # First column
            temp_line.append(line.strip())
            counter += 1
            continue
        elif counter == 1:
            # Analytic value column
            temp_line.append(line.strip().split()[-1])
            counter += 1
            continue
        elif counter == 2:
            # Approximate value column
            temp_line.append(line.strip().split()[-1])
            counter += 1
        elif counter == 3:
            # Skip the -- and reset the counter
            counter = 0
            continue

        # Write the rearranged data to file
        fout.write((' ').join(temp_line))
        fout.write('\n')
        temp_line = []

Note that this solution relies tightly on the structure of the file that you provided.

答案4

得分: 1

还有没有一种方法可以在不使用gnuplot/matplotlib的情况下绘制数据,而不需要转换格式?

是的,有! 这是一个独立于平台的仅使用gnuplot的解决方案。无需外部额外的数据准备工具。

如果要从文件绘制,请跳过$Data <<EOD ... EOD部分,而是使用plot 'yourFile.dat' ...

脚本:(适用于gnuplot >= 5.0.6,2017年3月)

### 绘制特殊数据格式
reset session

$Data <<EOD
0.1
Analytic value = 340.347685734
Approximated value = 332.45634555
--
0.2
Analytic value = 340.936745872
Approximated value = 332.57893789
--
0.3
Analytic value = 341.936745872
Approximated value = 333.57893789
EOD

set datafile missing NaN
set key out
myFilter(colD,colF,valF) = strcol(colF) eq valF ? column(colD) : NaN

plot $Data u (valid(1)?x0=$1:x0):(myFilter(4,1,"Analytic"))     w lp pt 7 lc "red"  ti "analytic", \
        '' u (valid(1)?x0=$1:x0):(myFilter(4,1,"Approximated")) w lp pt 7 lc "blue" ti "approximated"
### 脚本结束

结果:

从文件中获取特定文本。

英文:

> Or is there a way that I can plot the data without converting the format using gnuplot/matplotlib?

Yes, there is! Here is a platform-independent gnuplot-only solution. No need for external extra data preparation tools.

If you are plotting from a file, skip the $Data &lt;&lt;EOD ... EOD section and use plot &#39;yourFile.dat&#39; ... .

Script: (works for gnuplot>=5.0.6, March 2017)

### plot special data format
reset session

$Data &lt;&lt;EOD
0.1
Analytic value = 340.347685734
Approximated value = 332.45634555
--
0.2
Analytic value = 340.936745872
Approximated value = 332.57893789
--
0.3
Analytic value = 341.936745872
Approximated value = 333.57893789
EOD

set datafile missing NaN
set key out
myFilter(colD,colF,valF) = strcol(colF) eq valF ? column(colD) : NaN

plot $Data u (valid(1)?x0=$1:x0):(myFilter(4,1,&quot;Analytic&quot;))     w lp pt 7 lc &quot;red&quot;  ti &quot;analytic&quot;, \
        &#39;&#39; u (valid(1)?x0=$1:x0):(myFilter(4,1,&quot;Approximated&quot;)) w lp pt 7 lc &quot;blue&quot; ti &quot;approximated&quot;
### end of script

Result:

从文件中获取特定文本。

答案5

得分: 1

Using any awk:

$ awk '{n=(NR%4); val[n]=$NF} n==0{print val[1], val[2], val[3]}' file
0.1 340.347685734 332.45634555
0.2 340.936745872 332.57893789
英文:

Using any awk:

$ awk &#39;{n=(NR%4); val[n]=$NF} n==0{print val[1], val[2], val[3]}&#39; file
0.1 340.347685734 332.45634555
0.2 340.936745872 332.57893789

huangapple
  • 本文由 发表于 2023年6月8日 00:47:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76425507.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定