2023年4月19日 22:51:00go评论57阅读模式

英文:

merging two files with columns of that have different lengths and possible comments into CSV like file

问题

Here's the translated code part:

paste $(grep -v '^#' file1) file2

This code attempts to merge the contents of file1 and file2 as described in your request.

英文:

file1 looks like

# dsd
# dsd
1,2,5
2,3,5
1,2,5
2,3,5
3,4,5
3,4,5

file2 looks like

# s
1,2
1,2

I want to merge them to get

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

That is I want to keep the comment lines # from the first file after the comment lines, I want to paste columns from the second file, padding them to the column length of the first file. If there are any comment lines in the second file, ignore them.

I started with:

 paste $(grep -v &#39;^#&#39; file1) file2

but I get bash: /usr/bin/paste: Argument list too long

I guess this would be a job for awk but I am only familiar with single file processing and I have only found examples that deal with the same length files. Is there an easy way or one needs to go to longer bash script or python et al.?

答案1

得分: 2

您可以使用以下的 awk 解决方案：

awk -v OFS=, '
NR == FNR {
   if (!/^#)
      a[++i] = $0
   next
}
{
   if (/^#)
      print
   else {
      ++NR2
      if (NR2 in a)
         print $0, a[NR2]
      else
         print $0,"",""
   }
}' file2 file1

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

如果您需要进一步的信息或帮助，请告诉我。

英文:

You may use this awk solution:

awk -v OFS=, &#39;
NR == FNR {
   if (!/^#/)
      a[++i] = $0
   next
}
{
   if (/^#/)
      print
   else {
      ++NR2
      if (NR2 in a)
         print $0, a[NR2]
      else
         print $0,&quot;&quot;,&quot;&quot;
   }
}&#39; file2 file1

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

答案2

得分: 2

Using awk的示例代码：

$ cat tst.awk
BEGIN { FS=OFS="," }
FNR == 1 {
    lineNr = 0
    dflt = a[1]
    gsub("[^"FS"]+", "", dflt)
}
/^#/ {
    if ( NR != FNR ) {
        print
    }
    next
}
{ ++lineNr }
NR == FNR {
    a[lineNr] = $0
    next
}
{ print $0, (lineNr in a ? a[lineNr] : dflt) }

<p>

$ awk -f tst.awk file2 file1
# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

只返回翻译好的代码部分。

英文:

Using any awk:

$ cat tst.awk
BEGIN { FS=OFS=&quot;,&quot; }
FNR == 1 {
    lineNr = 0
    dflt = a[1]
    gsub(&quot;[^&quot;FS&quot;]+&quot;,&quot;&quot;,dflt)
}
/^#/ {
    if ( NR != FNR ) {
        print
    }
    next
}
{ ++lineNr }
NR == FNR {
    a[lineNr] = $0
    next
}
{ print $0, (lineNr in a ? a[lineNr] : dflt) }

<p>

$ awk -f tst.awk file2 file1
# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

答案3

得分: 1

使用伟大的Miller，通过以下代码运行paste、cat和grep，你可以得到以下结果：

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

步骤：

水平合并两个输入文件，移除注释行（通过paste和grep）；
添加缺失的逗号（通过mlr）；
将第一个文件的注释行添加到合并后的文件中（通过grep和cat）。

英文:

Using the great Miller, paste, cat and grep, you could run

paste -d &#39;,&#39; &lt;(grep -v &#39;^#&#39; file1.txt) &lt;(grep -v &#39;^#&#39; file2.txt) | mlr --csv -N --ragged cat &gt;output
&lt;file1.txt grep -P &#39;^#&#39; | cat - output &gt; tmp.txt &amp;&amp; mv tmp.txt output

to get

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

The steps:

merge the two input files horizontally, removing the comments lines (via paste and grep);
add missing commas (via mlr);
add the comment lines of first file to the merged one (via grep and cat)

答案4

得分: 0

以下是使用CSV模块的Ruby代码：

ruby -r csv -e '
f1 = CSV.read(ARGV[0])
f2 = CSV.read(ARGV[1]).select { |row| !row.join("").[/^\s*#/] }
f2 = [""] * f1.slice_when { |a, b| b.to_s[/\d/] }.first.length + f2
f2c = f2.max_by { |row| row.length }.length
puts CSV.generate { |csv|
    f1.zip(f2).each { |row|
        if row.flatten.join("").[/^\s*#/]
            csv << row[0]
        elsif row[-1].nil?
            csv << row[0] + [nil] * f2c
        else
            csv << row.flatten
        end
    }
}
' file1 file2

请注意，这段代码假设file1是两个文件中较长的一个，如果不是，可以轻松修改。

英文:

Here is a Ruby with the CSV module:

ruby -r csv -e &#39;
f1=CSV.read(ARGV[0])
f2=CSV.read(ARGV[1]).select{|row| !row.join(&quot;&quot;)[/^\s*#/] }
f2=[&quot;&quot;]*f1.slice_when{|a,b| b.to_s[/\d/]}.first.length+f2
f2c=f2.max_by{|row| row.length}.length
puts CSV.generate{|csv| 
    f1.zip(f2).each{|row| 
        if row.flatten.join(&quot;&quot;)[/^\s*#/] 
            csv&lt;&lt;row[0] 
        elsif row[-1].nil?
            csv&lt;&lt;row[0]+[nil]*f2c
        else 
            csv&lt;&lt;row.flatten
        end
    }
}
&#39; file1 file2

This is not limited to the assumption that file2 is only 2 columns.

It DOES assume that file1 is the longer of the two files. Easily changed if that is not true.

Prints:

# dsd
# dsd
1,2,5,1,2
2,3,5,1,2
1,2,5,,
2,3,5,,
3,4,5,,
3,4,5,,

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

合并两个具有不同长度和可能包含注释的列的文件成为类似CSV的文件。

问题

答案1

答案2

答案3

答案4

提取 git URL 的域名使用 POSIX。

如何使用awk比较两个文件时忽略特定列

在R中如何处理多个csv文件以识别空值？

从表单导入CSV文件并使用GO解析结果。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论