英文:
Using FPAT to split a CSV file and replace any embedded commas with a space " "
问题
我有一个文本文件,其中包含一个示例记录,看起来像这样...
`Tampa,Orlando,"Jacksonville,FL",Miami,"Tallahassee,FL"`
我需要将位置3和5的嵌套逗号替换为空格“" "”。
这是我在bash脚本中的awk代码...
AWK_script="BEGIN {
OFS=","
}
{
for (i=1; i<=NF; i++)
{
if ( $i==3 || $i==5 )
{
gsub(","," ",$i)
}
}
print $0
}
"
echo 'Tampa,Orlando,"Jacksonville,FL",Miami,"Tallahassee,FL"' | awk -vFPAT='([^,]*)|("[^"]+")' "${AWK_script}"
我无法让gsub将嵌套的逗号替换为空格“" "”。任何帮助将不胜感激。
英文:
I have a text file with a sample record that looks like this...
Tampa,Orlando,"Jacksonville,FL",Miami,"Tallahassee,FL"
I need to replace the embedded commas in position 3 and 5 with a space " "
Here is the awk code I have in a bash script...
AWK_script="BEGIN {
OFS=\",\"
}
{
for (i=1; i<=NF; i++)
{
if ( $i==3 || $i==5 )
{
gsub(\",\",\" \",$i)
}
}
print \$0
}
"
echo 'Tampa,Orlando,"Jacksonville,FL",Miami,"Tallahassee,FL"' | awk -vFPAT='([^,]*)|("[^"]+")' "${AWK_script}"
I'm unable to get the gsub to substitute the embedded commas to a space " ". Any help would be greatly appreciated.
答案1
得分: 2
你使用双引号将awk脚本包裹并将其存储在字符串中,这样做会让事情变得更加复杂。字符串用于存储文本,函数用于存储代码。除非你需要双引号,否则每个字符串或脚本都应该用单引号括起来,然后使用它们,除非你需要不使用引号。请使用单引号将其包裹并将其存储在函数中。另外,如果你这样做,就可以摆脱awk脚本内的所有双引号和$
符号之前的反斜杠。
我认为这是你想要做的:
$ cat tst.sh
#!/usr/bin/env bash
deComma() {
awk -v FPAT='([^,]*)|("([^"]|"")*")' -v OFS=',' '
{
for (i=3; i<=5; i+=2) {
gsub(/,/, " ", $i)
}
print
}
' "${@:--}"
}
echo 'Tampa,Orlando,"Jacksonville,FL",Miami,"Tallahassee,FL"' | deComma
$ ./tst.sh
Tampa,Orlando,"Jacksonville FL",Miami,"Tallahassee FL"
请参阅https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk,了解有关使用awk解析CSV的更多信息,包括我为什么更改了你的FPAT
设置。
英文:
You're making things much harder for yourself by using double quotes around the awk script and storing it in a string. Strings are for storing text, functions are for storing code. Every string or script should be enclosed in single quotes unless you need double quotes and then use those unless you need no quotes. Use single quotes around it and store it in a function instead. Among other things, if you do that you can get rid of all those backslashes before the double quotes and $
s inside the awk script.
I think this is what you're trying to do:
$ cat tst.sh
#!/usr/bin/env bash
deComma() {
awk -v FPAT='([^,]*)|("([^"]|"")*")' -v OFS=',' '
{
for (i=3; i<=5; i+=2) {
gsub(/,/," ",$i)
}
print
}
' "${@:--}"
}
echo 'Tampa,Orlando,"Jacksonville,FL",Miami,"Tallahassee,FL"' | deComma
<p>
$ ./tst.sh
Tampa,Orlando,"Jacksonville FL",Miami,"Tallahassee FL"
See https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk for more information on parsing CSVs with awk, including why I changed your FPAT
setting.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论