从CSV文件的列中删除逗号值,使用Linux shell。

huangapple go评论106阅读模式
英文:

Removing "," (comma) value from the columns of a CSV file using linux shell

问题

I'm having a CSV file with 5 columns where 1 column is having integer values like (1,564, 4,789, 9,765). I need to remove "," (comma) from col3 where integer value should look like as follows: (1564,4789,9865).

Tried various ways using "awf" & "sed" but none of it is working.

Below is the sample input and expected output.

Input CSV file:

col1,col2,col3,col4,col5
Abcd,defg,"1,432",hjik,khuj

Output CSV file:

col1,col2,col3,col4,col5
Abcd,defg,"1432",hjik,khuj
英文:

I'm having a CSV file with 5 columns where 1 column is having integer values like (1,564, 4,789, 9,765). I need to remove "," (comma) from col3 where integer value should look like as follows: (1564,4789,9865).

Tried various ways using "awf" & "sed"
but none of it is working.
Below is the sample input and expected output.
Input CSV file:

col1,col2,col3,col4,col5
Abcd,defg,"1,432",hjik,khuj

Output CSV file:

col1,col2,col3,col4,col5
Abcd,defg,"1432",hjik,khuj

答案1

得分: 1

这个正则表达式可能有助于您找到未引用的字段:

(\d{1,3}(?:,\d{3})+)

然后通过简单的替换,比如 "\1",来正确引用它们。

它寻找的是"1-3位数字,后面跟着逗号和三位数字的一些序列"。

我说可能,因为您只展示了一小部分数据的样本,而且这只适用于逗号合理分隔的数字。

我从这个损坏的CSV开始:

col1,col2,col3
a,1,b
a,12,b
a,123,b
a,1,234,b
a,12,345,b
a,123,456,b
a,1,234,567,b

然后得到了这个:

| c1 | c2        | c3 |
|----|-----------|----|
| a  | 1         | b  |
| a  | 12        | b  |
| a  | 123       | b  |
| a  | 1,234     | b  |
| a  | 12,345    | b  |
| a  | 123,456   | b  |
| a  | 1,234,567 | b  |

您可以在这里看到它的效果,regexr.com/7ht81

这个小的Python程序将修复我的示例CSV:

import re

lines: list[str] = []
with open("input.csv", encoding="utf-8") as f:
    for line in f:
        fixed = re.sub(r"(\d{1,3}(?:,\d{3})+)", '"\\1"', line)
        lines.append(fixed)

with open("output.csv", "w", encoding="utf-8") as f:
    f.writelines(lines)
英文:

This regex might help you find the unquoted fields:

(\d{1,3}(?:,\d{3})+)

and with a simple replacement, like "\1", properly quote them.

It looks for "1-3 digits, followed by some number of sequences of a comma and three digits".

I say might because you've only shown one small sample of the data, and this only works on numbers sensibly separated with commas.

I started with this broken CSV:

col1,col2,col3
a,1,b
a,12,b
a,123,b
a,1,234,b
a,12,345,b
a,123,456,b
a,1,234,567,b

and got this:

| c1 | c2        | c3 |
|----|-----------|----|
| a  | 1         | b  |
| a  | 12        | b  |
| a  | 123       | b  |
| a  | 1,234     | b  |
| a  | 12,345    | b  |
| a  | 123,456   | b  |
| a  | 1,234,567 | b  |

You can see it in action here, regexr.com/7ht81:

从CSV文件的列中删除逗号值,使用Linux shell。

This small Python program will fix my sample CSV:

import re

lines: list[str] = []
with open("input.csv", encoding="utf-8") as f:
    for line in f:
        fixed = re.sub(r"(\d{1,3}(?:,\d{3})+)", '"\"', line)
        lines.append(fixed)

with open("output.csv", "w", encoding="utf-8") as f:
    f.writelines(lines)

答案2

得分: 1

首先,用\r替换要保留的逗号,然后删除其他逗号。最后,恢复逗号。

sed -r 's/([^,]*),([^,]*),(.*),([^,]*),([^,]*)/\r\r\r\r/;s/,//g;s/\r/,/g' input.csv

编辑:
如Zach所评论的,您可能希望将所有\r替换为|||||。在Linux上,您只期望\n字符,但CSV格式通常在Windows上生成。

英文:

Fist replace the commas you want to keep with \r, next remove the others.
Finally restore the commas.

sed -r 's/([^,]*),([^,]*),(.*),([^,]*),([^,]*)/\r\r\r\r/;s/,//g;s/\r/,/g' input.csv

EDIT:
As Zach commented, you might want to replace all \r with |||||. On Linux you expect only \n characters, but a csv format is often generated on Windows.

答案3

得分: 0

你应该使用一个能识别CSV的工具,比如Miller

运行以下命令:

mlr --csv put '$col3=gsub($col3,",","")' input.csv > output.csv

你会得到如下结果:

col1,col2,col3,col4,col5
Abcd,defg,1432,hjik,khuj
  • put 是应用函数的动词。
  • gsub 是用于在 col3 中将 , 替换为空字符串的函数。
英文:

You should use a CSV aware tool as Miller.

Running

mlr --csv put '$col3=gsub($col3,",","")' input.csv >output.csv

you get

col1,col2,col3,col4,col5
Abcd,defg,1432,hjik,khuj
  • put is the verb to apply functions.

  • gsub is the function that substitutes , with nothing in col3

答案4

得分: -3

使用以下内容将逗号替换为空格、点或其他字符:

#!/bin/sh

read filename

#将所有出现的逗号替换为.
sed 's/,/./g' "$filename" >> output.csv
英文:

use below to replace comma with space,dot or anything

#!/bin/sh

read filename

#replace all_occurrence(g) of  comma with .
sed 's/,/./g' "$filename" >> output.csv

huangapple
  • 本文由 发表于 2023年7月31日 23:31:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76805101.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定