英文:
Adding leading zeros to a column in csv file
问题
我有一个包含3列数据的csv文件。我想要对第三列数据进行格式化,在前面添加零,并将其写入一个新文件。
A | B | C |
---|---|---|
9 | 1 | 33 |
8 | 1 | 82 |
6 | 1 | 5 |
3 | 1 | 481 |
我希望结果看起来像这样:
A | B | C |
---|---|---|
9 | 1 | 0033 |
8 | 1 | 0082 |
6 | 1 | 0005 |
3 | 1 | 0481 |
英文:
I have a csv file with 3 columns of data. I want to format the third column by adding leading zeros to it and write it to a new file.
A | B | C |
---|---|---|
9 | 1 | 33 |
8 | 1 | 82 |
6 | 1 | 5 |
3 | 1 | 481 |
I want it to look like this instead:
A | B | C |
---|---|---|
9 | 1 | 0033 |
8 | 1 | 0082 |
6 | 1 | 0005 |
3 | 1 | 0481 |
I am fairly new to coding so any help would be greatly appreciated!
答案1
得分: 2
在纯 Python 方法中,我会这样使用 [writer
][1] 和 [zfill
][2] :
import csv
with open("file.csv", "r") as in_file:
reader = csv.reader(in_file)
header = next(reader)
rows = [[r[0], r[1], r[2].zfill(4)] for r in reader]
with open("output.csv", "w", newline="") as out_file:
writer = csv.writer(out_file)
writer.writerow(header)
writer.writerows(rows)
使用 [tag:pandas],您可以尝试以下方式:
#pip install pandas
import pandas as pd
(pd.read_csv("file.csv", sep=",").assign(C=lambda x: x["C"].astype(str).str.zfill(4))
.to_csv("output.csv", index=False, sep=",")) # <- 调整新的分隔符/分隔符在这里
输出 (*output.csv*):
A,B,C
9,1,0033
8,1,0082
6,1,0005
3,1,0481
[1]: https://docs.python.org/3/library/csv.html#csv.writer
[2]: https://docs.python.org/3/library/stdtypes.html#str.zfill
<details>
<summary>英文:</summary>
In a pure *Python* approach, I would use [`writer`][1] & [`zfill`][2] this way :
import csv
with open("file.csv", "r") as in_file:
reader = csv.reader(in_file)
header = next(reader)
rows = [[r[0], r[1], r[2].zfill(4)] for r in reader]
with open("output.csv", "w", newline="") as out_file:
writer = csv.writer(out_file)
writer.writerow(header)
writer.writerows(rows)
With [tag:pandas], you can try this :
#pip install pandas
import pandas as pd
(pd.read_csv("file.csv", sep=",").assign(C=lambda x: x["C"].astype(str).str.zfill(4))
.to_csv("output.csv", index=False, sep=",")) # <- adjust the new sep/delimiter here
Output (*output.csv*) :
A,B,C
9,1,0033
8,1,0082
6,1,0005
3,1,0481
[1]: https://docs.python.org/3/library/csv.html#csv.writer
[2]: https://docs.python.org/3/library/stdtypes.html#str.zfill
</details>
# 答案2
**得分**: 1
这是一个非常朴素的方法,但我希望这有所帮助。让我们从基础开始。
```python
data = {
"a": [9, 8, 6, 3],
"b": [1, 1, 1, 1],
"c": [33, 82, 5, 481]
}
# 将数据加载到DataFrame对象中:
df = pd.DataFrame(data)
要循环遍历列的元素,您可以使用以下代码:
for x in df.c:
print(x) # 33,82,5,481
现在,如果您想要向列添加固定数量的零,可以这样做:
- 创建一个空列表
- 用带有前导零的所有列值附加列表。
- 使用列表更新列的值
n = []
for x in df.c:
n.append("00" + str(x)) # 附加带有前导零的所有列值的列表
print(n) # ['0033', '0082', '005', '00481']
df.c = n # 使用列表更新列的值
一个更简洁的方法是使用列表推导式:
n = ["00" + str(x) for x in df.c]
print(n) # ['0033', '0082', '005', '00481']
df.c = n
但是,根据您的问题,我看到最后一列只有4位数字,我使用了以下小逻辑来相应地获得结果:
n = []
for x in df.c:
if len(str(x)) == 0:
n.append("0000" + str(x))
elif len(str(x)) == 1:
n.append("000" + str(x))
elif len(str(x)) == 2:
n.append("00" + str(x))
elif len(str(x)) == 3:
n.append("0" + str(x))
else:
n.append(str(x))
print(n) # ['0033', '0082', '0005', '0481']
df.c = n
print(df)
要将其保存到文件中,您可以使用以下代码:
df.to_csv("new_data.csv")
希望这对您有所帮助!
英文:
This is a very naïve method, but I hope this helps. Let's start from basics
data = {
"a":[9,8,6,3],
"b":[1,1,1,1],
"c":[33,82,5,481]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
To loop through elements of a column you can use this:
for x in df.c:
print(x) #33,82,5,481
Now if you want to add fixed number of zeroes to the column, you can do:
1.Create an empty list
2.Append list with all column values with leading zeroes.
3.Update value of column with the list
n = []
for x in df.c:
n.append("00"+str(x))#append list with all column values with leading zeroes
print(n) #['0033', '0082', '005', '00481']
df.c = n #Update value of column with the list
A shorted method would be to use list comprehension
n = ["00"+str(x) for x in df.c]
print(n) #['0033', '0082', '005', '00481']
df.c = n
But since in your question I saw that the last column has only 4 digits, I used this small logic to get results accordingly
n = []
for x in df.c:
if len(str(x))==0:
n.append("0000"+str(x))
elif len(str(x))==1:
n.append("000"+str(x))
elif len(str(x))==2:
n.append("00"+str(x))
elif len(str(x))==3:
n.append("0"+str(x))
else :
n.append(str(x))
print(n) #['0033', '0082', '0005', '0481']
df.c = n
print(df)
# a b c
#0 9 1 0033
#1 8 1 0082
#2 6 1 0005
#3 3 1 0481
To save it into a file you can use
df.to_csv("new_data.csv")
Hope this helps you!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论