英文:
Checking for omitted leading zero in column and adding if omitted
问题
I have a large dataset of addresses, which include U.S. zipcodes. Some of the zipcodes are in five-digit format, and others are in nine-digit format. Regardless of format, if the zipcode has a leading zero (like many in Rhode Island), the leading zero has been dropped. So, I need to go through the d$zip column and identify observations where the zip is either length 4 or length 8 and then paste0("0"+d$zip) in its place to add back the leading zero. My question is how to efficiently get the conditional check written, given that I have almost 100,000 addresses.
Here is a toy df:
structure(list(ID = 1:3, street = c("555 Mockingbird Way", "909 Deadend Alley",
"1475 Wrongway Rd"), city = c("Anywhere", "Over There", "Nowhere"
), state = c("RI", "RI", "TX"), zip = c("02863", "28632142", "78215"
)), class = "data.frame", row.names = c(NA, -3L))
Note: There are two relevant questions already, but they do not address the check for 4 or 8 digit format.
英文:
I have a large dataset of addresses, which include U.S. zipcodes. Some of the zipcodes are in five-digit format, and others are in nine-digit format. Regardless of format, if the zipcode has a leading zero (like many in Rhode Island), the leading zero has been dropped. So, I need to go through the d$zip column and identify observations where the zip is either length 4 or length 8 and then paste0("0"+d$zip)in its place to add back the leading zero. My question is how to efficiently get the conditional check written, given that I have almost 100,000 addresses.
Here is a toy df:
structure(list(ID = 1:3, street = c("555 Mockingbird Way", "909 Deadend Alley",
"1475 Wrongway Rd"), city = c("Anywhere", "Over There", "Nowhere"
), state = c("RI", "RI", "TX"), zip = c("2863", "28632142", "78215"
)), class = "data.frame", row.names = c(NA, -3L))
Note: There are two relevant questions already, but they do not address the check for 4 or 8 digit format.
答案1
得分: 2
这应该可以工作:
dataset$new_zip <- ifelse(nchar(dataset$zip) %in% c(4, 8),
paste0("0", dataset$zip),
dataset$zip)
英文:
Assuming that the data frame is named dataset
, this should work:
dataset$new_zip <- ifelse(nchar(dataset$zip) %in% c(4, 8),
paste0("0", dataset$zip),
dataset$zip)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论