检查列中是否省略了前导零,如果省略则添加。

huangapple go评论70阅读模式
英文:

Checking for omitted leading zero in column and adding if omitted

问题

I have a large dataset of addresses, which include U.S. zipcodes. Some of the zipcodes are in five-digit format, and others are in nine-digit format. Regardless of format, if the zipcode has a leading zero (like many in Rhode Island), the leading zero has been dropped. So, I need to go through the d$zip column and identify observations where the zip is either length 4 or length 8 and then paste0("0"+d$zip) in its place to add back the leading zero. My question is how to efficiently get the conditional check written, given that I have almost 100,000 addresses.

Here is a toy df:

structure(list(ID = 1:3, street = c("555 Mockingbird Way", "909 Deadend Alley", 
"1475 Wrongway Rd"), city = c("Anywhere", "Over There", "Nowhere"
), state = c("RI", "RI", "TX"), zip = c("02863", "28632142", "78215"
)), class = "data.frame", row.names = c(NA, -3L))

Note: There are two relevant questions already, but they do not address the check for 4 or 8 digit format.

英文:

I have a large dataset of addresses, which include U.S. zipcodes. Some of the zipcodes are in five-digit format, and others are in nine-digit format. Regardless of format, if the zipcode has a leading zero (like many in Rhode Island), the leading zero has been dropped. So, I need to go through the d$zip column and identify observations where the zip is either length 4 or length 8 and then paste0("0"+d$zip)in its place to add back the leading zero. My question is how to efficiently get the conditional check written, given that I have almost 100,000 addresses.

Here is a toy df:

structure(list(ID = 1:3, street = c("555 Mockingbird Way", "909 Deadend Alley", 
"1475 Wrongway Rd"), city = c("Anywhere", "Over There", "Nowhere"
), state = c("RI", "RI", "TX"), zip = c("2863", "28632142", "78215"
)), class = "data.frame", row.names = c(NA, -3L))

Note: There are two relevant questions already, but they do not address the check for 4 or 8 digit format.

答案1

得分: 2

这应该可以工作:

dataset$new_zip <- ifelse(nchar(dataset$zip) %in% c(4, 8), 
                          paste0("0", dataset$zip), 
                          dataset$zip)
英文:

Assuming that the data frame is named dataset, this should work:

dataset$new_zip &lt;- ifelse(nchar(dataset$zip) %in% c(4, 8), 
                          paste0(&quot;0&quot;, dataset$zip), 
                          dataset$zip) 

huangapple
  • 本文由 发表于 2023年5月22日 12:44:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76303087.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定