在CSV文件中为列添加前导零。

huangapple go评论69阅读模式
英文:

Adding leading zeros to a column in csv file

问题

我有一个包含3列数据的csv文件。我想要对第三列数据进行格式化,在前面添加零,并将其写入一个新文件。

A B C
9 1 33
8 1 82
6 1 5
3 1 481

我希望结果看起来像这样:

A B C
9 1 0033
8 1 0082
6 1 0005
3 1 0481
英文:

I have a csv file with 3 columns of data. I want to format the third column by adding leading zeros to it and write it to a new file.

A B C
9 1 33
8 1 82
6 1 5
3 1 481

I want it to look like this instead:

A B C
9 1 0033
8 1 0082
6 1 0005
3 1 0481

I am fairly new to coding so any help would be greatly appreciated!

答案1

得分: 2

在纯 Python 方法中,我会这样使用 [writer][1] 和 [zfill][2] :

import csv

with open("file.csv", "r") as in_file:
    reader = csv.reader(in_file)
    header = next(reader)
    rows = [[r[0], r[1], r[2].zfill(4)] for r in reader]

with open("output.csv", "w", newline="") as out_file:
    writer = csv.writer(out_file)
    writer.writerow(header)
    writer.writerows(rows)

使用 [tag:pandas],您可以尝试以下方式:

#pip install pandas
import pandas as pd

(pd.read_csv("file.csv", sep=",").assign(C=lambda x: x["C"].astype(str).str.zfill(4))
    .to_csv("output.csv", index=False, sep=",")) # <- 调整新的分隔符/分隔符在这里

输出 (*output.csv*)

A,B,C
9,1,0033
8,1,0082
6,1,0005
3,1,0481


  [1]: https://docs.python.org/3/library/csv.html#csv.writer
  [2]: https://docs.python.org/3/library/stdtypes.html#str.zfill

<details>
<summary>英文:</summary>

In a pure *Python* approach, I would use [`writer`][1] &amp; [`zfill`][2] this way :

    import csv

    with open(&quot;file.csv&quot;, &quot;r&quot;) as in_file:
        reader = csv.reader(in_file)
        header = next(reader)
        rows = [[r[0], r[1], r[2].zfill(4)] for r in reader]
    
    with open(&quot;output.csv&quot;, &quot;w&quot;, newline=&quot;&quot;) as out_file:
        writer = csv.writer(out_file)
        writer.writerow(header)
        writer.writerows(rows)

With [tag:pandas], you can try this :

    #pip install pandas
    import pandas as pd
    
    (pd.read_csv(&quot;file.csv&quot;, sep=&quot;,&quot;).assign(C=lambda x: x[&quot;C&quot;].astype(str).str.zfill(4))
        .to_csv(&quot;output.csv&quot;, index=False, sep=&quot;,&quot;)) # &lt;- adjust the new sep/delimiter here

Output (*output.csv*) :

    A,B,C
    9,1,0033
    8,1,0082
    6,1,0005
    3,1,0481


  [1]: https://docs.python.org/3/library/csv.html#csv.writer
  [2]: https://docs.python.org/3/library/stdtypes.html#str.zfill

</details>



# 答案2
**得分**: 1

这是一个非常朴素的方法,但我希望这有所帮助。让我们从基础开始。

```python
data = {
    "a": [9, 8, 6, 3],
    "b": [1, 1, 1, 1],
    "c": [33, 82, 5, 481]
}
# 将数据加载到DataFrame对象中:
df = pd.DataFrame(data)

要循环遍历列的元素,您可以使用以下代码:

for x in df.c:
    print(x)  # 33,82,5,481

现在,如果您想要向列添加固定数量的零,可以这样做:

  1. 创建一个空列表
  2. 用带有前导零的所有列值附加列表。
  3. 使用列表更新列的值
n = []
for x in df.c:
    n.append("00" + str(x))  # 附加带有前导零的所有列值的列表
print(n)  # ['0033', '0082', '005', '00481']
df.c = n  # 使用列表更新列的值

一个更简洁的方法是使用列表推导式:

n = ["00" + str(x) for x in df.c]
print(n)  # ['0033', '0082', '005', '00481']
df.c = n

但是,根据您的问题,我看到最后一列只有4位数字,我使用了以下小逻辑来相应地获得结果:

n = []
for x in df.c:
    if len(str(x)) == 0:
        n.append("0000" + str(x))
    elif len(str(x)) == 1:
        n.append("000" + str(x))
    elif len(str(x)) == 2:
        n.append("00" + str(x))
    elif len(str(x)) == 3:
        n.append("0" + str(x))
    else:
        n.append(str(x))

print(n)  # ['0033', '0082', '0005', '0481']

df.c = n
print(df)

要将其保存到文件中,您可以使用以下代码:

df.to_csv("new_data.csv")

希望这对您有所帮助!

英文:

This is a very naïve method, but I hope this helps. Let's start from basics

data = {
 &quot;a&quot;:[9,8,6,3],
 &quot;b&quot;:[1,1,1,1],
 &quot;c&quot;:[33,82,5,481]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)

To loop through elements of a column you can use this:

for x in df.c:
    print(x)  #33,82,5,481

Now if you want to add fixed number of zeroes to the column, you can do:
1.Create an empty list
2.Append list with all column values with leading zeroes.
3.Update value of column with the list

n = [] 
for x in df.c:
  n.append(&quot;00&quot;+str(x))#append list with all column values with leading zeroes
print(n)  #[&#39;0033&#39;, &#39;0082&#39;, &#39;005&#39;, &#39;00481&#39;]
df.c = n   #Update value of column with the list

A shorted method would be to use list comprehension

n = [&quot;00&quot;+str(x) for x in df.c]
print(n)  #[&#39;0033&#39;, &#39;0082&#39;, &#39;005&#39;, &#39;00481&#39;]
df.c = n

But since in your question I saw that the last column has only 4 digits, I used this small logic to get results accordingly

n = []
for x in df.c:
    if len(str(x))==0:
        n.append(&quot;0000&quot;+str(x))
    elif len(str(x))==1:
        n.append(&quot;000&quot;+str(x))
    elif len(str(x))==2:
        n.append(&quot;00&quot;+str(x))
    elif len(str(x))==3:
        n.append(&quot;0&quot;+str(x))
    else :
        n.append(str(x))

print(n)  #[&#39;0033&#39;, &#39;0082&#39;, &#39;0005&#39;, &#39;0481&#39;]

df.c = n
print(df)

#	a	b	c
#0	9	1	0033
#1	8	1	0082
#2	6	1	0005
#3	3	1	0481

To save it into a file you can use

df.to_csv(&quot;new_data.csv&quot;)

Hope this helps you!

huangapple
  • 本文由 发表于 2023年4月20日 00:43:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056966.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定