合并两个具有不同长度和可能包含注释的列的文件成为类似CSV的文件。

huangapple go评论48阅读模式
英文:

merging two files with columns of that have different lengths and possible comments into CSV like file

问题

Here's the translated code part:

paste $(grep -v '^#' file1) file2

This code attempts to merge the contents of file1 and file2 as described in your request.

英文:

file1 looks like

# dsd
# dsd
1,2,5
2,3,5
1,2,5
2,3,5
3,4,5
3,4,5

file2 looks like

# s
1,2
1,2

I want to merge them to get

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

That is I want to keep the comment lines # from the first file after the comment lines, I want to paste columns from the second file, padding them to the column length of the first file. If there are any comment lines in the second file, ignore them.

I started with:

 paste $(grep -v '^#' file1) file2

but I get bash: /usr/bin/paste: Argument list too long

I guess this would be a job for awk but I am only familiar with single file processing and I have only found examples that deal with the same length files. Is there an easy way or one needs to go to longer bash script or python et al.?

答案1

得分: 2

您可以使用以下的 awk 解决方案:

awk -v OFS=, '
NR == FNR {
   if (!/^#)
      a[++i] = $0
   next
}
{
   if (/^#)
      print
   else {
      ++NR2
      if (NR2 in a)
         print $0, a[NR2]
      else
         print $0,"",""
   }
}' file2 file1

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

如果您需要进一步的信息或帮助,请告诉我。

英文:

You may use this awk solution:

awk -v OFS=, '
NR == FNR {
   if (!/^#/)
      a[++i] = $0
   next
}
{
   if (/^#/)
      print
   else {
      ++NR2
      if (NR2 in a)
         print $0, a[NR2]
      else
         print $0,"",""
   }
}' file2 file1

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

答案2

得分: 2

Using awk的示例代码:

$ cat tst.awk
BEGIN { FS=OFS="," }
FNR == 1 {
    lineNr = 0
    dflt = a[1]
    gsub("[^"FS"]+", "", dflt)
}
/^#/ {
    if ( NR != FNR ) {
        print
    }
    next
}
{ ++lineNr }
NR == FNR {
    a[lineNr] = $0
    next
}
{ print $0, (lineNr in a ? a[lineNr] : dflt) }

<p>

$ awk -f tst.awk file2 file1
# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

只返回翻译好的代码部分。

英文:

Using any awk:

$ cat tst.awk
BEGIN { FS=OFS=&quot;,&quot; }
FNR == 1 {
    lineNr = 0
    dflt = a[1]
    gsub(&quot;[^&quot;FS&quot;]+&quot;,&quot;&quot;,dflt)
}
/^#/ {
    if ( NR != FNR ) {
        print
    }
    next
}
{ ++lineNr }
NR == FNR {
    a[lineNr] = $0
    next
}
{ print $0, (lineNr in a ? a[lineNr] : dflt) }

<p>

$ awk -f tst.awk file2 file1
# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

答案3

得分: 1

使用伟大的Miller,通过以下代码运行paste、cat和grep,你可以得到以下结果:

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

步骤:

  • 水平合并两个输入文件,移除注释行(通过pastegrep);
  • 添加缺失的逗号(通过mlr);
  • 将第一个文件的注释行添加到合并后的文件中(通过grepcat)。
英文:

Using the great Miller, paste, cat and grep, you could run

paste -d &#39;,&#39; &lt;(grep -v &#39;^#&#39; file1.txt) &lt;(grep -v &#39;^#&#39; file2.txt) | mlr --csv -N --ragged cat &gt;output
&lt;file1.txt grep -P &#39;^#&#39; | cat - output &gt; tmp.txt &amp;&amp; mv tmp.txt output

to get

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

The steps:

  • merge the two input files horizontally, removing the comments lines (via paste and grep);

  • add missing commas (via mlr);

  • add the comment lines of first file to the merged one (via grep and cat)

答案4

得分: 0

以下是使用CSV模块的Ruby代码:

ruby -r csv -e '
f1 = CSV.read(ARGV[0])
f2 = CSV.read(ARGV[1]).select { |row| !row.join("").[/^\s*#/] }
f2 = [""] * f1.slice_when { |a, b| b.to_s[/\d/] }.first.length + f2
f2c = f2.max_by { |row| row.length }.length
puts CSV.generate { |csv|
    f1.zip(f2).each { |row|
        if row.flatten.join("").[/^\s*#/]
            csv << row[0]
        elsif row[-1].nil?
            csv << row[0] + [nil] * f2c
        else
            csv << row.flatten
        end
    }
}
' file1 file2

请注意,这段代码假设file1是两个文件中较长的一个,如果不是,可以轻松修改。

英文:

Here is a Ruby with the CSV module:

ruby -r csv -e &#39;
f1=CSV.read(ARGV[0])
f2=CSV.read(ARGV[1]).select{|row| !row.join(&quot;&quot;)[/^\s*#/] }
f2=[&quot;&quot;]*f1.slice_when{|a,b| b.to_s[/\d/]}.first.length+f2
f2c=f2.max_by{|row| row.length}.length
puts CSV.generate{|csv| 
    f1.zip(f2).each{|row| 
        if row.flatten.join(&quot;&quot;)[/^\s*#/] 
            csv&lt;&lt;row[0] 
        elsif row[-1].nil?
            csv&lt;&lt;row[0]+[nil]*f2c
        else 
            csv&lt;&lt;row.flatten
        end
    }
}
&#39; file1 file2 

This is not limited to the assumption that file2 is only 2 columns.

It DOES assume that file1 is the longer of the two files. Easily changed if that is not true.

Prints:

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

huangapple
  • 本文由 发表于 2023年4月19日 22:51:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055924.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定