2023年5月20日 23:55:54go评论85阅读模式

英文:

How do I remove the last 2 columns from a tsv file?

问题

I am coding using bash on terminal through a docker container on my mac. 我正在使用终端上的Bash编码，通过我的Mac上的Docker容器。 I am struggling to figure out how to remove the last 2 columns on my TSV file. 我正在努力找出如何删除我的TSV文件中的最后两列。 It has 7 total and the last 2 are not needed for my work and are required to be removed. 它总共有7列，最后两列对我的工作没有用处，需要被删除。

Edit: 编辑：
The first picture is the original data file, the second is what the code is doing and it is deleting some random entries from the column. 第一张图片是原始数据文件，第二张图片是代码正在执行的操作，它正在从列中删除一些随机条目。 The third picture is what the end result of this program should do. 第三张图片是该程序的最终结果应该是什么。

I tried using awk and using NF = NF - 2 which does remove the last 2 columns but for some reason deletes some of the data I have in my 5th column which I need. 我尝试使用awk并使用NF = NF - 2来删除最后两列，但出于某种原因，它会删除我在第五列中的一些数据，而我需要这些数据。 So whilst I got the column deletion I needed, the code did a little extra. 所以虽然我得到了我需要的列删除，但代码做了一点额外的工作。

I Have a few other lines but they shouldn't cause any issues. They just check the file exists etc. 我还有几行其他代码，但不应该引起任何问题。它们只是检查文件是否存在等。

英文:

I am coding using bash on terminal through a docker container on my mac. I am struggling to figure out how to remove the last 2 columns on my TSV file. It has 7 total and the last 2 are not needed for my work and are required to be removed.

Edit:
The first picture is the original data file, the second is what the code is doing and it is deleting some random entries from the column. The third picture is what the end result of this program should do. The month and year columns I am struggling with also but I deleted the code and tried to simplify the data first.

I tried using awk and using NF = NF - 2 which does remove the last 2 columns but for some reason deletes some of the data I have in my 5th column which I need. So whilst I got the column deletion I needed, the code did a little extra. Here is the code:

preprocess() {
 31     input_file=&quot;$1&quot;
 32 
 33     # Extract the base name of the input file
 34     base_name=$(basename &quot;$input_file&quot; .tsv)
 35 
 36     # Create the new output file name
 37     output_file=&quot;${base_name}_clean.tsv&quot;
 38 
 39     awk -F&#39;\t&#39; &#39;BEGIN{OFS=FS} 
 40     {
 41         NF = NF - 2
 42 
 43         print
 44     }&#39; &quot;$input_file&quot; &gt; &quot;$output_file&quot;
 45 }

I Have a few other lines but they shouldn't cause any issues. They just check the file exists etc.

答案1

得分: 2

使用AWK，以下是该脚本的工作方式：

输入数据（为了清晰起见，在此处使用“;”分隔，但也可以是制表符）。

F1;F2;F3;F4
V11;V12;V13;V14
V21;V22;V23;V24

转换程序。每个人都可以遵循注释，即使是对AWK不熟悉的人也可以。

BEGIN{
   FS=";"    # 将分隔符更改为制表符
   OFS=";"   # 根据需要设置
   skipcolcount=2
}
{ 
  # 在每行中，循环遍历字段
  for (i=1;i<=NF-skipcolcount;i++) {   
     printf $i                   # 通过索引变量引用字段  
     if (i<NF-skipcolcount) {    # 在最后一个字段后面没有分隔符
       printf OFS
     }
  }
  printf "\n"                    # 每行后换行
}

结果：

F1;F2
V11;V12
V21;V22

英文:

With AWK a script like this works:

Inputdata (separated by ; for clarity here, but could be tab also).

F1;F2;F3;F4
V11;V12;V13;V14
V21;V22;V23;V24

Program to convert. Comments to everyone can follow, even those new to awk.

BEGIN{
   FS=&quot;;&quot;    # Convert til &quot;\t&quot; for TAB separation
   OFS=&quot;;&quot;   # set as desired
   skipcolcount=2
}
{ 
  # In each line, loop over the fields
  for (i=1;i&lt;=NF-skipcolcount;i++) {   
     printf $i                   # reference field by index variable  
     if (i&lt;NF-skipcolcount) {    # no separator after last field
       printf OFS
     }
  }
  printf &quot;\n&quot;                    # linefeed after each line
}

Result:

F1;F2
V11;V12
V21;V22

答案2

得分: 2

我尝试使用awk并使用NF = NF - 2，它确实删除了最后两列，但出乎意料地删除了我第五列中需要的一些数据。

这对我来说是意外的，我确实运行了你的代码，使用的是GNU Awk 5.1.0，它运行正常，然而你正在使用docker，所以可能强制使用不稳定版本的awk？不管怎样，如果你的任务是这样给出的：如何删除TSV文件的最后2列。总共有7列，最后2列对我的工作没有用，需要删除。

这可能简化为：获取制表符分隔文件的前5列，可以用awk表示为：

awk 'BEGIN{FS=OFS="\t"}{print $1,$2,$3,$4,$5}' file.tsv

请运行它并写下输出是否如所需。

英文:

> I tried using awk and using NF = NF - 2 which does remove the last 2
> columns but for some reason deletes some of the data I have in my 5th
> column which I need.

This is unexpected for me, I did run your code using GNU Awk 5.1.0 and it works fine, however you are using

> docker

so maybe this force usage of erratic version of awk? Anyway, if your task is given as

> how to remove the last 2 columns on my TSV file. It has 7 total and
> the last 2 are not needed for my work and are required to be removed.

this might be simplified to: get first 5 columns of tab-separated file, which can be expressed in awk as

awk &#39;BEGIN{FS=OFS=&quot;\t&quot;}{print $1,$2,$3,$4,$5}&#39; file.tsv

Please run it and write if output is as desired.

答案3

得分: 1

The easiest will be rev/cut/rev combination

$ rev inputfile | cut -f3- | rev > output.file

英文:

The easiest will be rev/cut/rev combination

$ rev inputfile | cut -f3- | rev &gt; output.file

答案4

得分: 1

你靠近了。

给定以下的TSV文件：

cat file
1	2	3	4	5
6	7	8	9	10
11	12	13	14	15

你可以使用awk来实现：

awk 'BEGIN{FS=OFS="\t";}
{NF=3} 1
' file

输出：

1	2	3
6	7	8
11	12	13

或者Ruby：

ruby -ne 'puts $_.split("\t")[0..2].join("\t")' file
# 相同

或者Perl：

perl -nE 'say join("\t", (split "\t")[0..2])' file
# 相同

英文:

You are close.

Given the following TSV file:

cat file
1	2	3	4	5
6	7	8	9	10
11	12	13	14	15

You can do this in awk:

awk &#39;BEGIN{FS=OFS=&quot;\t&quot;}
{NF=3} 1
&#39; file

Prints:

1	2	3
6	7	8
11	12	13

Or Ruby:

ruby -ne &#39;puts $_.split(&quot;\t&quot;)[0..2].join(&quot;\t&quot;)&#39; file
# same

Or Perl:

perl -nE &#39;say join(&quot;\t&quot;, (split &quot;\t&quot;)[0..2])&#39; file
# same

答案5

得分: 1

请注意，以下是代码部分的翻译：

一种方法是使用 [tag:sed]：

sed &#39;s/\t[^\t]*\t[^\t]*$//&#39; &quot;$input_file&quot; &gt; &quot;$output_file&quot;

匹配说明：

\t - 制表符
[^\t]* - 零个或多个非制表符字符
\t - 制表符
[^\t]* - 零个或多个非制表符字符
$ - 行尾锚点

用空字符串替代（// 部分）。

另一种方法：

sed -E &#39;s/(\t[^\t]*){2}$//&#39; &quot;$input_file&quot; &gt; &quot;$output_file&quot;

在这里，匹配 \t[^\t]* 在一个分组 (...) 中重复两次 {2}。

英文:

One way could be to use [tag:sed]:

sed &#39;s/\t[^\t]*\t[^\t]*$//&#39; &quot;$input_file&quot; &gt; &quot;$output_file&quot;

Match explained:

\t - a tab character
[^\t]* - zero or more non-tab characters
\t - a tab character
[^\t]* - zero or more non-tab characters
$ - end of line anchor

Substitute (the // part) with an empty string.

An alternative:

sed -E &#39;s/(\t[^\t]*){2}$//&#39; &quot;$input_file&quot; &gt; &quot;$output_file&quot;

Here the match \t[^\t]* is in a group (...) which is repeated twice {2}.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Remove the last 2 columns from a tsv file如何从tsv文件中删除最后2列？

问题

答案1

答案2

答案3

答案4

答案5

使用Linux的’column’实用工具。

awk可以找到包含列表中字符串的字段吗？

在Go中执行包含函数声明的动态bash脚本的一行代码。

Date extraction from file name in bash

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论