“Error substituting elements of column awk” 的中文翻译是:”替换 awk 列元素时出错”。

huangapple go评论66阅读模式
英文:

Error substituting elements of column awk

问题

以下是您要翻译的内容:

"我正在使用泰坦尼克号的CSV文件,并尝试替换第5列和第12列的元素,即性别和上船地点,因此性别的元素应为m/f,而在第12列,不应该是港口的首字母,而应该是港口的全名。<br>

CSV文件的原始格式如下:<br>

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,Nan,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.28,C85,C
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39,0,5,382652,29.12,Nan,Q
890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30.00,C148,C
891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,Nan,Q

经过修改后,应该如下所示:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",m,22,1,0,A/5 21171,7.25,Nan,Southampton
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",f,38,1,0,PC 17599,71.28,C85,Cherbourg
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",f,39,0,5,382652,29.12,Nan,Queenstown
890,1,1,"Behr, Mr. Karl Howell",m,26,0,0,111369,30.00,C148,Cherbourg
891,0,3,"Dooley, Mr. Patrick",m,32,0,0,370376,7.75,Nan,Queenstown

但是,它没有替换第12列的元素,除了最后一行,在列sex中替换正确:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",m,22,1,0,A/5 21171,7.25,Nan,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",f,38,1,0,PC 17599,71.28,C85,C
...
886,0,3,"Rice, Mrs. William (Margaret Norton)",f,39,0,5,382652,29.12,Nan,Q
890,1,1,"Behr, Mr. Karl Howell",m,26,0,0,111369,30.00,C148,C
891,0,3,"Dooley, Mr. Patrick",m,32,0,0,370376,7.75,Nan,Queenstown

脚本如下:

BEGIN {
    FPAT = &quot;([^,]*)|(\&quot;[^\&quot;]+\&quot;)&quot;
    OFS = &quot;,&quot;
}

{
    # Change the value in the sex column to 'm' if it's 'male' or 'f' if it's 'female'
    if ($5 == &quot;female&quot;) 
        $5 = &quot;f&quot;
    else if ($5 == &quot;male&quot;) 
        $5 = &quot;m&quot;
    
    # Perform substitution in the embarked column
    if ($12 == &quot;C&quot;) 
        $12 = &quot;Cherbourg&quot;
    else if ($12 == &quot;Q&quot;) 
        $12 = &quot;Queenstown&quot;
    else if ($12 == &quot;S&quot;) 
        $12 = &quot;Southampton&quot;
     
    print $0
}

为了澄清,第12行的元素中没有空格或可能导致匹配失败的字符,在Python中,替换正常工作。"

英文:

I'm using the titanic csv and i've been trying to substitute the elements of the 5th and 12th column, sex and embarked, so the elements of sex instead of being male/female should be m/f and in the 12th column instead of being the first letter of the port have to be the full name of the port. <br>

The csv originally looks like this:<br>

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,&quot;Braund, Mr. Owen Harris&quot;,male,22,1,0,A/5 21171,7.25,Nan,S
2,1,1,&quot;Cumings, Mrs. John Bradley (Florence Briggs Thayer)&quot;,female,38,1,0,PC 17599,71.28,C85,C
...
886,0,3,&quot;Rice, Mrs. William (Margaret Norton)&quot;,female,39,0,5,382652,29.12,Nan,Q
890,1,1,&quot;Behr, Mr. Karl Howell&quot;,male,26,0,0,111369,30.00,C148,C
891,0,3,&quot;Dooley, Mr. Patrick&quot;,male,32,0,0,370376,7.75,Nan,Q

And it should look like this after the modifications:

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,&quot;Braund, Mr. Owen Harris&quot;,m,22,1,0,A/5 21171,7.25,Nan,Southampton
2,1,1,&quot;Cumings, Mrs. John Bradley (Florence Briggs Thayer)&quot;,f,38,1,0,PC 17599,71.28,C85,Cherbourg
...
886,0,3,&quot;Rice, Mrs. William (Margaret Norton)&quot;,f,39,0,5,382652,29.12,Nan,Queenstown
890,1,1,&quot;Behr, Mr. Karl Howell&quot;,m,26,0,0,111369,30.00,C148,Cherbourg
891,0,3,&quot;Dooley, Mr. Patrick&quot;,m,32,0,0,370376,7.75,Nan,Queenstown

But it doesn't substitute the elements of the 12th column except in the last row, the column sex is substituted correctly:<br>

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,&quot;Braund, Mr. Owen Harris&quot;,m,22,1,0,A/5 21171,7.25,Nan,S
2,1,1,&quot;Cumings, Mrs. John Bradley (Florence Briggs Thayer)&quot;,f,38,1,0,PC 17599,71.28,C85,C
...
886,0,3,&quot;Rice, Mrs. William (Margaret Norton)&quot;,f,39,0,5,382652,29.12,Nan,Q
890,1,1,&quot;Behr, Mr. Karl Howell&quot;,m,26,0,0,111369,30.00,C148,C
891,0,3,&quot;Dooley, Mr. Patrick&quot;,m,32,0,0,370376,7.75,Nan,Queenstown

The script is the following:

BEGIN {
    FPAT = &quot;([^,]*)|(\&quot;[^\&quot;]+\&quot;)&quot;
    OFS = &quot;,&quot;
}

{
    # Cambiar el valor de la columna sexo a 0 si es &quot;female&quot; o a 1 si es &quot;male&quot;
    if ($5 == &quot;female&quot;) 
        $5 = &quot;f&quot;
    else if ($5 == &quot;male&quot;) 
        $5 = &quot;m&quot;
    
    # Realizar la sustituci&#243;n en la columna embarked
    if ($12 == &quot;C&quot;) 
        $12 = &quot;Cherbourg&quot;
    else if ($12 == &quot;Q&quot;) 
        $12 = &quot;Queenstown&quot;
    else if ($12 == &quot;S&quot;) 
        $12 = &quot;Southampton&quot;
     
    print $0
}

To clarify, none of the elements of the 12th row has spaces or characters that would make the match fail, in python the substitution works fine.

答案1

得分: 2

我怀疑你正在运行这个代码在MacOS(或者潜在地在FreeBSD上,因为MacOS版本最初就是从FreeBSD派生出来的)。从我的FreeBSD机器上显式选择gnu awk 会得到你想要的结果。

[dev ~/test/awktest]$ gawk -f code.awk data.txt
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,&quot;Braund, Mr. Owen Harris&quot;,m,22,1,0,A/5 21171,7.25,Nan,Southampton
2,1,1,&quot;Cumings, Mrs. John Bradley (Florence Briggs Thayer)&quot;,f,38,1,0,PC 17599,71.28,C85,Cherbourg
...
886,0,3,&quot;Rice, Mrs. William (Margaret Norton)&quot;,f,39,0,5,382652,29.12,Nan,Queenstown
890,1,1,&quot;Behr, Mr. Karl Howell&quot;,m,26,0,0,111369,30.00,C148,Cherbourg
891,0,3,&quot;Dooley, Mr. Patrick&quot;,m,32,0,0,370376,7.75,Nan,Queenstown

(值得注意的是,运行FreeBSD awk 也不能正确地执行_任何_替换...)

英文:

I suspect you are running this on MacOS (or potentially FreeBSD, which is where the MacOS version came from, originally). Explicitly choosing gnu awk from my FreeBSD box gives me what you want.

[dev ~/test/awktest]$ gawk -f code.awk data.txt
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,&quot;Braund, Mr. Owen Harris&quot;,m,22,1,0,A/5 21171,7.25,Nan,Southampton
2,1,1,&quot;Cumings, Mrs. John Bradley (Florence Briggs Thayer)&quot;,f,38,1,0,PC 17599,71.28,C85,Cherbourg
...
886,0,3,&quot;Rice, Mrs. William (Margaret Norton)&quot;,f,39,0,5,382652,29.12,Nan,Queenstown
890,1,1,&quot;Behr, Mr. Karl Howell&quot;,m,26,0,0,111369,30.00,C148,Cherbourg
891,0,3,&quot;Dooley, Mr. Patrick&quot;,m,32,0,0,370376,7.75,Nan,Queenstown

(Admittedly, running FreeBSD awk doesn't get either substitution right...)

答案2

得分: 2

由于在行的中间进行更改有效,但在行末部不起作用,我怀疑换行符是您的问题。结尾处的 \r 可能会解释您的症状。

更稳健的处理CSV的方法是使用已经内置完整CSV解析器的工具。

Python具有CSV支持,例如,sqlite3广泛可用:

#!/bin/sh

sqlite3 &gt;&quot;new.csv&quot; &lt;&lt;&#39;EOD&#39;
.mode csv
.headers on
.import &quot;orig.csv&quot; t
update t set
    sex = case
            when sex=&quot;female&quot; then &quot;f&quot;
            when sex=&quot;male&quot;   then &quot;m&quot;
            else sex
        end,
    embarked = case
            when embarked=&quot;C&quot; then &quot;Cherbourg&quot;
            when embarked=&quot;Q&quot; then &quot;Queenstown&quot;
            when embarked=&quot;S&quot; then &quot;Southampton&quot;
            else embarked
        end
;
select * from t;
EOD
英文:

Since a change in the middle of the row works, but not at the end, I suspect line-endings are your problem. Trailing \r would explain your symptoms.


A more robust approach to manipulate CSV is to use a tool that already has a full CSV parser built into it.

Python has CSV support or, for example, sqlite3 is widely available:

#!/bin/sh

sqlite3 &gt;&quot;new.csv&quot; &lt;&lt;&#39;EOD&#39;
.mode csv
.headers on
.import &quot;orig.csv&quot; t
update t set
    sex = case
            when sex=&quot;female&quot; then &quot;f&quot;
            when sex=&quot;male&quot;   then &quot;m&quot;
            else sex
        end,
    embarked = case
            when embarked=&quot;C&quot; then &quot;Cherbourg&quot;
            when embarked=&quot;Q&quot; then &quot;Queenstown&quot;
            when embarked=&quot;S&quot; then &quot;Southampton&quot;
            else embarked
        end
;
select * from t;
EOD

huangapple
  • 本文由 发表于 2023年6月29日 21:37:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76581584.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定