Code for finding the middle letter of a name in "babynames" dataset is giving faulty result when the name has 6 letters

huangapple go评论63阅读模式

Code for finding the middle letter of a name in "babynames" dataset is giving faulty result when the name has 6 letters




babynames %>%
    length = str_length(name),
    middle = if_else((length / 2) %% 2 != 0, 
             str_sub(name, ceiling(length / 2), ceiling(length / 2)),
             str_sub(name, length / 2, (length / 2)+1)




I am trying to solve an exercise question from R for data science (2E) of chapter 16 (16.5.4 question no. 1) which requires to extract the middle letter of every name of the dataset. So I wrote the code below to find the middle letter if the name has odd number of letters or the middle two letters if the name has even number of letter.


   babynames |>
    length = str_length(name),
    middle = if_else((length / 2) %% 2 != 0, 
             str_sub(name, ceiling(length / 2), ceiling(length / 2)),
             str_sub(name, length / 2, (length / 2)+1)

Now the code gives me my expected result except when the name has 6 letters. Instead of extracting the middle two letters it shows only the first of the two letters

    # A tibble: 1,924,665 × 7
    year sex   name          n   prop length middle
   <dbl> <chr> <chr>     <int>  <dbl>  <int> <chr> 
 1  1880 F     Mary       7065 0.0724      4 ar    
 2  1880 F     Anna       2604 0.0267      4 nn    
 3  1880 F     Emma       2003 0.0205      4 mm    
 4  1880 F     Elizabeth  1939 0.0199      9 a     
 5  1880 F     Minnie     1746 0.0179      6 n     
 6  1880 F     Margaret   1578 0.0162      8 ga    
 7  1880 F     Ida        1472 0.0151      3 d     
 8  1880 F     Alice      1414 0.0145      5 i     
 9  1880 F     Bertha     1320 0.0135      6 r     
10  1880 F     Sarah      1288 0.0132      5 r     
# … with 1,924,655 more rows
# ℹ Use `print(n = ...)` to see more rows

I don't understand why the code is making an exception for the names with 6 letters. What can be the reason behind this?


得分: 0


data.frame(length=1:10) |>
  mutate(compare=(length / 2) %% 2 != 0)

注意,这不是正确触发奇偶值的方法。这里多了一个 /2,实际上是在检查数字是否能被4整除。你应该使用以下代码:

babynames |>
    length = str_length(name),
    middle = if_else(length %% 2 != 0, 
                     str_sub(name, ceiling(length / 2), ceiling(length / 2)),
                     str_sub(name, length / 2, (length / 2)+1)

Your check for odd/even values is incorrect. Look at

data.frame(length=1:10) |>
  mutate(compare=(length / 2) %% 2 != 0)
#    length compare
# 1       1    TRUE
# 2       2    TRUE
# 3       3    TRUE
# 4       4   FALSE
# 5       5    TRUE
# 6       6    TRUE
# 7       7    TRUE
# 8       8   FALSE
# 9       9    TRUE
# 10     10    TRUE

Notice that's not triggering correctly for even/odd values. The extra /2 in there is checking that numbers are actually divisible by 4. You should be using

babynames |>
    length = str_length(name),
    middle = if_else(length %% 2 != 0, 
                     str_sub(name, ceiling(length / 2), ceiling(length / 2)),
                     str_sub(name, length / 2, (length / 2)+1)
#     year sex   name          n   prop length middle
#    <dbl> <chr> <chr>     <int>  <dbl>  <int> <chr> 
#  1  1880 F     Mary       7065 0.0724      4 ar    
#  2  1880 F     Anna       2604 0.0267      4 nn    
#  3  1880 F     Emma       2003 0.0205      4 mm    
#  4  1880 F     Elizabeth  1939 0.0199      9 a     
#  5  1880 F     Minnie     1746 0.0179      6 nn    
#  6  1880 F     Margaret   1578 0.0162      8 ga   
# ...

  • 本文由 发表于 2023年2月23日 22:51:07
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
