我需要一个文件转换脚本。

huangapple go评论58阅读模式
英文:

I need a file converter script

问题

这是您提供的脚本中可能存在的问题之一:

在脚本的这一行:

while read -a LINE from $FILE

应该将其更正为:

while read -a LINE; do

这样脚本应该能够正确读取文件中的每一行,并执行后续的操作以转换数据。希望这有助于解决您的问题。

英文:

I need to convert a file that looks like this:

Fri Apr 14 15:42:02 UTC 2023

MemTotal: 65039504 kB 
MemFree: 41010436 kB 
MemAvailable: 45100588 kB

Fri Apr 14 16:35:01 UTC 2023

MemTotal: 65039504 kB 
MemFree: 40409508 kB 
MemAvailable: 44902852 kB

Fri Apr 14 16:36:01 UTC 2023

MemTotal: 65039504 kB 
MemFree: 40411232 kB 
MemAvailable: 44905376 kB

To something that looks like this:

15:42:02,65039504,41010436,45100588 
16:35:01,65039504,40409508,44902852 
16:36:01,65039504,40411232,44905376

Here's the script that I came up with:

#!/bin/bash
set -x

export TIME
export MEMTOTAL
export MEMFREE
export MEMAVAIL
export FILE="./SMALL-SAMPLE.txt"

while read -a LINE from $FILE
do
    WORD1=${LINE[0]}
    WORD2=${LINE[1]}
    WORD3=${LINE[2]}
    WORD4=${LINE[3]}
    WORD5=${LINE[4]}
    WORD6=${LINE[5]}
    WORD7=${LINE[6]}
    case $WORD1 in
        "Fri")
           TIME=$WORD4
           ;;
        "MemTotal")
           MEMTOTAL=$WORD2
           ;;
        "MemFree")
           MEMFREE=$WORD2
           ;;
        "MemAvailable")
          MEMAVAIL=$WORD2
          ;;
       *)continue;;
    esac
    LINEOUT="$TIME,$MEMTOTAL,$MEMFREE,$MEMAVAIL"
    echo $LINEOUT
done < $FILE

Here's the output:

15:42:02,,, 
16:35:01,,, 
16:36:01,,,

I've got a rookie mistake hidden in that script somewhere...any ideas about why I cannot get my data in?

答案1

得分: 2

因为 TMTOWTDI(There's More Than One Way To Do It),这是一个更短的 Perl 版本:

<file perl -anE '
    push @a, $F[3]||$F[1]||();
    say join ",", splice @a if @a > 3;
'
  • -n 使 Perl 对每条记录/行运行脚本。
  • -a 打开记录的自动拆分(按空格),将其拆分为数组 @F
  • $F[3](时间戳)或 $F[1](数值)中首个已定义的项追加到 @a。如果两者都未定义,则追加 () 不执行任何操作。
  • 如果 @a 有4个元素,将它们打印出来,并截断数组。
英文:

Because TMTOWTDI, a shorter Perl version:

&lt;file perl -anE &#39;
    push @a, $F[3]||$F[1]||();
    say join&quot;,&quot;,splice@a if @a&gt;3;
&#39;
  • -n makes Perl run the script for each record/line.
  • -a turns on autosplit of records (by whitespace) into array @F
  • Append to @a the first defined of $F[3] (timestamp) or $F[1] (value). If neither is defined, appending () is a no-op.
  • If @a has 4 elements, print them, and truncate the array.

Your code has a few errors and stylistic issues.

  • As @barmar stated, you should only print a line of output when you see the MemAvailable line (and so you also don't need the */continue clause)
  • read ... from $FILE doesn't mean what you may think it does.
  • You are not matching the trailing colon.
  • You should normally quote variable use to avoid unintended word-splitting, globbing, etc (although that shouldn't happen here).
  • You shouldn't use all-caps variable names - they are reserved for the system.
  • There is no great advantage to defining new variables that only get used once.
  • No need to export any variables.
file=&quot;./SMALL-SAMPLE.txt&quot;

while read -a line
do
   case &quot;${line[0]}&quot; in
      Fri)
         time=${line[3]}
         ;;
      MemTotal:)
         memtotal=${line[1]}
         ;;
      MemFree:)
         memfree=${line[1]}
         ;;
      MemAvailable:)
         memavail=${line[1]}
         echo &quot;$time,$memtotal,$memfree,$memavail&quot;
         ;;
   esac
done &lt; &quot;$file&quot;

答案2

得分: 1

One approach would be:

  1. 移除所有空行(使用 `sed '/^$/d' $FILE')

  2. 每次迭代读取四行(重复使用 read

  3. 使用 cut 命令提取所需字段

$ cat script.sh
#!/bin/bash

FILE="./SMALL-SAMPLE.txt"

while read ltime; \
      read lmemt; \
      read lmemf; \
      read lmema;
do
    TIME=$(echo $ltime | cut -d ' ' -f 4)
    MEMT=$(echo $lmemt | cut -d ' ' -f 2)
    MEMF=$(echo $lmemf | cut -d ' ' -f 2)
    MEMA=$(echo $lmema | cut -d ' ' -f 2)

    echo "$TIME,$MEMT,$MEMF,$MEMA"

done < <(sed '/^$/d' $FILE)

测试:

$ ./script.sh 
15:42:02,65039504,41010436,45100588
16:35:01,65039504,40409508,44902852
16:36:01,65039504,40411232,44905376
英文:

One approach would be:

  1. Remove all empty lines (with sed &#39;/^$/d&#39; $FILE)

  2. Read four lines at once each iteration (repeating read for each one)

  3. Extract the desired field using cut command
    <!-- -->

    $ cat script.sh
    #!/bin/bash

    FILE="./SMALL-SAMPLE.txt"

    while read ltime;
    read lmemt;
    read lmemf;
    read lmema;
    do
    TIME=$(echo $ltime | cut -d ' ' -f 4)
    MEMT=$(echo $lmemt | cut -d ' ' -f 2)
    MEMF=$(echo $lmemf | cut -d ' ' -f 2)
    MEMA=$(echo $lmema | cut -d ' ' -f 2)

    echo &quot;$TIME,$MEMT,$MEMF,$MEMA&quot;
    

    done < <(sed '/^$/d' $FILE)

Testing:

$ ./script.sh 
15:42:02,65039504,41010436,45100588
16:35:01,65039504,40409508,44902852
16:36:01,65039504,40411232,44905376

答案3

得分: 0

    if (/:[^ ]/) { print /\d+:\d+:\d+/g, "," }
    elsif (/: /) { print /(\d+)/, $. % 6 == 5 ? "\n" : "," }
    ' -- file
  • -n 逐行读取输入并针对每一行运行代码;
  • 如果行中包含冒号后跟非空格字符,打印包含三组数字由冒号分隔的部分,以及逗号;
  • 否则,如果行中包含冒号后跟空格字符,打印其中的数字,每个组的最后一行后面打印换行符,否则打印逗号。
英文:
perl -ne &#39;
    if (/:[^ ]/) { print /\d+:\d+:\d+/g, &quot;,&quot; }
    elsif (/: /) { print /(\d+)/, $. % 6 == 5 ? &quot;\n&quot; : &quot;,&quot; }
    &#39; -- file
  • -n reads the input line by line and runs the code for each line;
  • if the line contains a colon followed by non-space, print the part that contains three groups of digits separated by colons, and a comma;
  • otherwise, if the line contains a colon followed by a space, print the digits it contains, followed by a newline on every last line of a group, otherwise a comma.

答案4

得分: 0

以下是您要翻译的内容:

你在 case 选项中忘记了尾随的冒号 ("MemTotal:" 而不是 "MemTotal")。但是有更简单和更快的解决方案(bash 循环速度较慢)。

使用 awk 的示例(在GNU awk 和macOS Ventura提供的awk上测试过):

$ awk -v RS= -v OFS=, '/^Mem/ {print t,$2,$5,$8;next} {t=$4}' file
15:42:02,65039504,41010436,45100588
16:35:01,65039504,40409508,44902852
16:36:01,65039504,40411232,44905376

解释:

  • -v RS= 将记录分隔符设置为空行。
  • -v OFS=, 将输出字段分隔符设置为逗号。
  • /^Mem/ {print t,$2,$5,$8;next} 适用于以 Mem 开头的记录,打印变量 t 的值和字段 2、5 和 8(记录中的 3 个大小),然后转到下一个记录。
  • {t=$4} 将第四个字段(时间)存储在变量 t 中。
英文:

You forgot the trailing colons in your case choices (&quot;MemTotal:&quot; instead of &quot;MemTotal&quot;). But there are much simpler and much faster solutions (bash loops are slow).

Example with awk (tested with GNU awk and the awk that comes with macOS Ventura):

$ awk -v RS= -v OFS=, &#39;/^Mem/ {print t,$2,$5,$8;next} {t=$4}&#39; file
15:42:02,65039504,41010436,45100588
16:35:01,65039504,40409508,44902852
16:36:01,65039504,40411232,44905376

Explanations:

  • -v RS= sets the record separator to empty lines.
  • -v OFS=, sets the output field separator to commas.
  • /^Mem/ {print t,$2,$5,$8;next} applies to records starting with Mem, prints the value of variable t and fields 2, 5 and 8 (the 3 sizes in the record), and goes to the next record.
  • {t=$4} stores the fourth field (time) in variable t.

答案5

得分: 0

15:42:02,65039504,41010436,45100588
16:35:01,65039504,40409508,44902852
16:36:01,65039504,40411232,44905376

英文:
echo &#39;Fri Apr 14 15:42:02 UTC 2023

MemTotal: 65039504 kB
MemFree: 41010436 kB
MemAvailable: 45100588 kB

Fri Apr 14 16:35:01 UTC 2023

MemTotal: 65039504 kB
MemFree: 40409508 kB
MemAvailable: 44902852 kB

Fri Apr 14 16:36:01 UTC 2023

MemTotal: 65039504 kB
MemFree: 40411232 kB
MemAvailable: 44905376 kB&#39; | 

> nawk '(NF = NF)^(ORS = NR % 4 ? "," : "\n")' RS='\n+'
> OFS= FS='^(([^: ]+ )+|[^ :]+: )| [[:alpha:]]+.+$'

  • more succinct ::
    > gawk '(ORS=/v/?RS:",")^!(NF+=OFS=_)' FS='^([^:]+|[^ ]+) | [?-|].+$'

  • extreme compression ::
    > mawk 'NF&&/v/*($__==/: /?","$2:$4)'

    15:42:02,65039504,41010436,45100588
    16:35:01,65039504,40409508,44902852
    16:36:01,65039504,40411232,44905376

huangapple
  • 本文由 发表于 2023年4月20日 07:18:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76059480.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定