英文:
Removing "," (comma) value from the columns of a CSV file using linux shell
问题
I'm having a CSV file with 5 columns where 1 column is having integer values like (1,564, 4,789, 9,765). I need to remove "," (comma) from col3 where integer value should look like as follows: (1564,4789,9865).
Tried various ways using "awf" & "sed" but none of it is working.
Below is the sample input and expected output.
Input CSV file:
col1,col2,col3,col4,col5
Abcd,defg,"1,432",hjik,khuj
Output CSV file:
col1,col2,col3,col4,col5
Abcd,defg,"1432",hjik,khuj
英文:
I'm having a CSV file with 5 columns where 1 column is having integer values like (1,564, 4,789, 9,765). I need to remove "," (comma) from col3 where integer value should look like as follows: (1564,4789,9865).
Tried various ways using "awf" & "sed"
but none of it is working.
Below is the sample input and expected output.
Input CSV file:
col1,col2,col3,col4,col5
Abcd,defg,"1,432",hjik,khuj
Output CSV file:
col1,col2,col3,col4,col5
Abcd,defg,"1432",hjik,khuj
答案1
得分: 1
这个正则表达式可能有助于您找到未引用的字段:
(\d{1,3}(?:,\d{3})+)
然后通过简单的替换,比如 "\1"
,来正确引用它们。
它寻找的是"1-3位数字,后面跟着逗号和三位数字的一些序列"。
我说可能,因为您只展示了一小部分数据的样本,而且这只适用于逗号合理分隔的数字。
我从这个损坏的CSV开始:
col1,col2,col3
a,1,b
a,12,b
a,123,b
a,1,234,b
a,12,345,b
a,123,456,b
a,1,234,567,b
然后得到了这个:
| c1 | c2 | c3 |
|----|-----------|----|
| a | 1 | b |
| a | 12 | b |
| a | 123 | b |
| a | 1,234 | b |
| a | 12,345 | b |
| a | 123,456 | b |
| a | 1,234,567 | b |
您可以在这里看到它的效果,regexr.com/7ht81。
这个小的Python程序将修复我的示例CSV:
import re
lines: list[str] = []
with open("input.csv", encoding="utf-8") as f:
for line in f:
fixed = re.sub(r"(\d{1,3}(?:,\d{3})+)", '"\\1"', line)
lines.append(fixed)
with open("output.csv", "w", encoding="utf-8") as f:
f.writelines(lines)
英文:
This regex might help you find the unquoted fields:
(\d{1,3}(?:,\d{3})+)
and with a simple replacement, like "\1"
, properly quote them.
It looks for "1-3 digits, followed by some number of sequences of a comma and three digits".
I say might because you've only shown one small sample of the data, and this only works on numbers sensibly separated with commas.
I started with this broken CSV:
col1,col2,col3
a,1,b
a,12,b
a,123,b
a,1,234,b
a,12,345,b
a,123,456,b
a,1,234,567,b
and got this:
| c1 | c2 | c3 |
|----|-----------|----|
| a | 1 | b |
| a | 12 | b |
| a | 123 | b |
| a | 1,234 | b |
| a | 12,345 | b |
| a | 123,456 | b |
| a | 1,234,567 | b |
You can see it in action here, regexr.com/7ht81:
This small Python program will fix my sample CSV:
import re
lines: list[str] = []
with open("input.csv", encoding="utf-8") as f:
for line in f:
fixed = re.sub(r"(\d{1,3}(?:,\d{3})+)", '"\"', line)
lines.append(fixed)
with open("output.csv", "w", encoding="utf-8") as f:
f.writelines(lines)
答案2
得分: 1
首先,用\r
替换要保留的逗号,然后删除其他逗号。最后,恢复逗号。
sed -r 's/([^,]*),([^,]*),(.*),([^,]*),([^,]*)/\r\r\r\r/;s/,//g;s/\r/,/g' input.csv
编辑:
如Zach所评论的,您可能希望将所有\r
替换为|||||
。在Linux上,您只期望\n
字符,但CSV格式通常在Windows上生成。
英文:
Fist replace the commas you want to keep with \r
, next remove the others.
Finally restore the commas.
sed -r 's/([^,]*),([^,]*),(.*),([^,]*),([^,]*)/\r\r\r\r/;s/,//g;s/\r/,/g' input.csv
EDIT:
As Zach commented, you might want to replace all \r
with |||||
. On Linux you expect only \n
characters, but a csv format is often generated on Windows.
答案3
得分: 0
你应该使用一个能识别CSV的工具,比如Miller。
运行以下命令:
mlr --csv put '$col3=gsub($col3,",","")' input.csv > output.csv
你会得到如下结果:
col1,col2,col3,col4,col5
Abcd,defg,1432,hjik,khuj
put
是应用函数的动词。gsub
是用于在col3
中将,
替换为空字符串的函数。
英文:
You should use a CSV aware tool as Miller.
Running
mlr --csv put '$col3=gsub($col3,",","")' input.csv >output.csv
you get
col1,col2,col3,col4,col5
Abcd,defg,1432,hjik,khuj
-
put
is the verb to apply functions. -
gsub
is the function that substitutes,
with nothing incol3
答案4
得分: -3
使用以下内容将逗号替换为空格、点或其他字符:
#!/bin/sh
read filename
#将所有出现的逗号替换为.
sed 's/,/./g' "$filename" >> output.csv
英文:
use below to replace comma with space,dot or anything
#!/bin/sh
read filename
#replace all_occurrence(g) of comma with .
sed 's/,/./g' "$filename" >> output.csv
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论