截断数字,但它们未显示正确数量的字符。

huangapple go评论98阅读模式
英文:

Truncating numbers but they are not showing up the right number of characters

问题

我正在尝试提取hs10变量的前6个数字,但对于某些情况,我只得到了5个字符 - 这是否有原因?

我已经使用了这个函数:

  1. us_chn_tariffs_18$HS6 <- as.numeric(substr(format(us_chn_tariffs_18$hs10, scientific = F), 1, 6))

dput如下:

  1. structure(list(hs10 = structure(c(208100000, 208902500, 301110010,
  2. 301110020, 301110090, 301990390, 302230000, 302290110, 302290190,
  3. 302420000, 302455000, 302595010, 302595090, 302740000, 302845000,
  4. 302895077, 302912000, 303120022, 303120032, 303230000), label = "HS10 Product Code", format.stata = "%10.0f"),
  5. tariff_max = structure(c(0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
  6. 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
  7. 0.1), label = "US Import Tariff Increase (max)", format.stata = "%9.3f"),
  8. tariff_scaled = structure(c(0.0333333350718021, 0.0333333350718021,
  9. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  10. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  11. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  12. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  13. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  14. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021
  15. ), label = "US Import Tariff Increase (scaled)", format.stata = "%9.3f"),
  16. effective_mdate = structure(c(704, 704, 704, 704, 704, 704,
  17. 704, 704, 704, 704, 704, 704, 704, 704, 704, 704, 704, 704,
  18. 704, 704), label = "Month Variety First Targeted", format.stata = "%tm"),
  19. month = c("9", "9", "9", "9", "9", "9", "9", "9", "9", "9",
  20. "9", "9", "9", "9", "9", "9", "9", "9", "9", "9"), treated = c(1,
  21. 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
  22. HS6 = structure(c(20810, 20890, 30111, 30111, 30111, 30199,
  23. 30223, 30229, 30229, 30242, 30245, 30259, 30259, 30274, 30284,
  24. 30289, 30291, 30312, 30312, 30323), label = "HS10 Product Code", format.stata = "%10.0f")), row.names = c(NA,
  25. -20L), class = c("tbl_df", "tbl", "data.frame"))

谢谢

英文:

I am trying to extract the first 6 numbers for the hs10 variable, but I've been getting only 5 characters for some - is there a reason for this?

截断数字,但它们未显示正确数量的字符。

I have used this function

  1. us_chn_tariffs_18$HS6 <- as.numeric(substr(format(us_chn_tariffs_18$hs10, scientific = F), 1, 6))

The dput is:

  1. structure(list(hs10 = structure(c(208100000, 208902500, 301110010,
  2. 301110020, 301110090, 301990390, 302230000, 302290110, 302290190,
  3. 302420000, 302455000, 302595010, 302595090, 302740000, 302845000,
  4. 302895077, 302912000, 303120022, 303120032, 303230000), label = "HS10 Product Code", format.stata = "%10.0f"),
  5. tariff_max = structure(c(0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
  6. 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
  7. 0.1), label = "US Import Tariff Increase (max)", format.stata = "%9.3f"),
  8. tariff_scaled = structure(c(0.0333333350718021, 0.0333333350718021,
  9. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  10. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  11. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  12. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  13. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021,
  14. 0.0333333350718021, 0.0333333350718021, 0.0333333350718021
  15. ), label = "US Import Tariff Increase (scaled)", format.stata = "%9.3f"),
  16. effective_mdate = structure(c(704, 704, 704, 704, 704, 704,
  17. 704, 704, 704, 704, 704, 704, 704, 704, 704, 704, 704, 704,
  18. 704, 704), label = "Month Variety First Targeted", format.stata = "%tm"),
  19. month = c("9", "9", "9", "9", "9", "9", "9", "9", "9", "9",
  20. "9", "9", "9", "9", "9", "9", "9", "9", "9", "9"), treated = c(1,
  21. 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
  22. HS6 = structure(c(20810, 20890, 30111, 30111, 30111, 30199,
  23. 30223, 30229, 30229, 30242, 30245, 30259, 30259, 30274, 30284,
  24. 30289, 30291, 30312, 30312, 30323), label = "HS10 Product Code", format.stata = "%10.0f")), row.names = c(NA,
  25. -20L), class = c("tbl_df", "tbl", "data.frame"))

Thank you

答案1

得分: 0

I think I can see where the problem comes in, but cannot reproduce it quite exactly (as I've not imported from Stata). The Stata print format is %10.0f, which prints 10 characters, padding your digits with leading spaces. But your IDs are all 9 digits long, so the first character pulled from substr is a space, which as.numeric trims. Try reformating hs10:

  1. hs10 <- c(
  2. 208100000,
  3. 208902500,
  4. 301110010,
  5. 301110020,
  6. 301110090,
  7. 301990390,
  8. 302230000,
  9. 302290110,
  10. 302290190,
  11. 302420000,
  12. 302455000,
  13. 302595010,
  14. 302595090,
  15. 302740000,
  16. 302845000,
  17. 302895077,
  18. 302912000,
  19. 303120022,
  20. 303120032,
  21. 303230000
  22. )
  23. # Reproducing the values you're getting
  24. hs10_not_working <- sprintf(fmt = "%10.0f", hs10)
  25. as.numeric(substr(format(hs10_not_working, scientific = F), 1, 6))
  26. #> [1] 20810 20890 30111 30111 30111 30199 30223 30229 30229 30242 30245 30259
  27. #> [13] 30259 30274 30284 30289 30291 30312 30312 30323
  28. # Corrected values
  29. hs10_edited <- sprintf(fmt = "%-10.0f", hs10)
  30. as.numeric(substr(format(hs10_edited, scientific = F), 1, 6))
  31. #> [1] 208100 208902 301110 301110 301110 301990 302230 302290 302290 302420
  32. #> [11] 302455 302595 302595 302740 302845 302895 302912 303120 303120 303230

The new sprintf(fmt = "%-10.0f", hs10) part will left-align the numbers (- character) which will then mean the first 6 digits are extracted even if the value is only 9 digits.

Another simple solution would be to trimws before substring your variable:

  1. us_chn_tariffs_18$HS6 <- as.numeric(substr(trimws(format(us_chn_tariffs_18$hs10, scientific = F)), 1, 6))
英文:

I think I can see where the problem comes in, but cannot reproduce it quite exactly (as I've not imported from Stata). The Stata print format is %10.0f, which prints 10 characters, padding your digits with leading spaces. But your IDs are all 9 digits long, so the first character pulled from substr is a space, which as.numeric trims. Try reformating hs10:

  1. hs10 <- c(
  2. 208100000,
  3. 208902500,
  4. 301110010,
  5. 301110020,
  6. 301110090,
  7. 301990390,
  8. 302230000,
  9. 302290110,
  10. 302290190,
  11. 302420000,
  12. 302455000,
  13. 302595010,
  14. 302595090,
  15. 302740000,
  16. 302845000,
  17. 302895077,
  18. 302912000,
  19. 303120022,
  20. 303120032,
  21. 303230000
  22. )
  23. # Reproducing the values you're getting
  24. hs10_not_working <- sprintf(fmt = "%10.0f", hs10)
  25. as.numeric(substr(format(hs10_not_working, scientific = F), 1, 6))
  26. #> [1] 20810 20890 30111 30111 30111 30199 30223 30229 30229 30242 30245 30259
  27. #> [13] 30259 30274 30284 30289 30291 30312 30312 30323
  28. # Corrected values
  29. hs10_edited <- sprintf(fmt = "%-10.0f", hs10)
  30. as.numeric(substr(format(hs10_edited, scientific = F), 1, 6))
  31. #> [1] 208100 208902 301110 301110 301110 301990 302230 302290 302290 302420
  32. #> [11] 302455 302595 302595 302740 302845 302895 302912 303120 303120 303230

The new sprintf(fmt = "%-10.0f", hs10) part will left-align the numbers (- character) which will then mean the first 6 digits are extracted even if the value is only 9 digits.

Another simple solution would be to trimws before substring your variable:

  1. us_chn_tariffs_18$HS6 <- as.numeric(substr(trimws(format(us_chn_tariffs_18$hs10, scientific = F)), 1, 6))

huangapple
  • 本文由 发表于 2023年2月27日 06:45:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75575446.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定