2023年2月23日 22:51:07go评论100阅读模式

英文:

Code for finding the middle letter of a name in "babynames" dataset is giving faulty result when the name has 6 letters

问题

我试图解决《R数据科学（第2版）》第16章（16.5.4问题1）中的练习问题，该问题要求提取数据集中每个姓名的中间字母。因此，我编写了下面的代码来找到如果姓名具有奇数个字母则找到中间字母，如果姓名具有偶数个字母则找到中间两个字母。

library(tidyverse)
library(babynames)
babynames %>%
  mutate(
    length = str_length(name),
    middle = if_else((length / 2) %% 2 != 0, 
             str_sub(name, ceiling(length / 2), ceiling(length / 2)),
             str_sub(name, length / 2, (length / 2)+1)
    )
  )

现在，这段代码给出了我预期的结果，除非姓名有6个字母。如果姓名有6个字母，它会显示两个字母中的第一个字母。

我不明白为什么代码会对具有6个字母的姓名做出异常处理。这可能是什么原因呢？

英文:

I am trying to solve an exercise question from R for data science (2E) of chapter 16 (16.5.4 question no. 1) which requires to extract the middle letter of every name of the dataset. So I wrote the code below to find the middle letter if the name has odd number of letters or the middle two letters if the name has even number of letter.

 library(tidyverse)
 library(babynames)
   babynames |&gt;
   mutate(
    length = str_length(name),
    middle = if_else((length / 2) %% 2 != 0, 
             str_sub(name, ceiling(length / 2), ceiling(length / 2)),
             str_sub(name, length / 2, (length / 2)+1)
    )
   )

Now the code gives me my expected result except when the name has 6 letters. Instead of extracting the middle two letters it shows only the first of the two letters

    # A tibble: 1,924,665 &#215; 7
    year sex   name          n   prop length middle
   &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;     &lt;int&gt;  &lt;dbl&gt;  &lt;int&gt; &lt;chr&gt; 
 1  1880 F     Mary       7065 0.0724      4 ar    
 2  1880 F     Anna       2604 0.0267      4 nn    
 3  1880 F     Emma       2003 0.0205      4 mm    
 4  1880 F     Elizabeth  1939 0.0199      9 a     
 5  1880 F     Minnie     1746 0.0179      6 n     
 6  1880 F     Margaret   1578 0.0162      8 ga    
 7  1880 F     Ida        1472 0.0151      3 d     
 8  1880 F     Alice      1414 0.0145      5 i     
 9  1880 F     Bertha     1320 0.0135      6 r     
10  1880 F     Sarah      1288 0.0132      5 r     
# … with 1,924,655 more rows
# ℹ Use `print(n = ...)` to see more rows

I don't understand why the code is making an exception for the names with 6 letters. What can be the reason behind this?

答案1

得分: 0

你的奇偶值检查是不正确的。看一下下面的代码：

data.frame(length=1:10) |>
  mutate(compare=(length / 2) %% 2 != 0)

注意，这不是正确触发奇偶值的方法。这里多了一个 /2，实际上是在检查数字是否能被4整除。你应该使用以下代码：

babynames |>
  mutate(
    length = str_length(name),
    middle = if_else(length %% 2 != 0, 
                     str_sub(name, ceiling(length / 2), ceiling(length / 2)),
                     str_sub(name, length / 2, (length / 2)+1)
    )
  )

英文:

Your check for odd/even values is incorrect. Look at

data.frame(length=1:10) |&gt;
  mutate(compare=(length / 2) %% 2 != 0)
#    length compare
# 1       1    TRUE
# 2       2    TRUE
# 3       3    TRUE
# 4       4   FALSE
# 5       5    TRUE
# 6       6    TRUE
# 7       7    TRUE
# 8       8   FALSE
# 9       9    TRUE
# 10     10    TRUE

Notice that's not triggering correctly for even/odd values. The extra /2 in there is checking that numbers are actually divisible by 4. You should be using

babynames |&gt;
  mutate(
    length = str_length(name),
    middle = if_else(length %% 2 != 0, 
                     str_sub(name, ceiling(length / 2), ceiling(length / 2)),
                     str_sub(name, length / 2, (length / 2)+1)
    )
  )
#     year sex   name          n   prop length middle
#    &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;     &lt;int&gt;  &lt;dbl&gt;  &lt;int&gt; &lt;chr&gt; 
#  1  1880 F     Mary       7065 0.0724      4 ar    
#  2  1880 F     Anna       2604 0.0267      4 nn    
#  3  1880 F     Emma       2003 0.0205      4 mm    
#  4  1880 F     Elizabeth  1939 0.0199      9 a     
#  5  1880 F     Minnie     1746 0.0179      6 nn    
#  6  1880 F     Margaret   1578 0.0162      8 ga   
# ...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Code for finding the middle letter of a name in "babynames" dataset is giving faulty result when the name has 6 letters

问题

答案1

How can I replace values in one column with values from another out of several options, when the first column contains the name of the other column?

使用ggrepel和ggplot2创建带有标签的堆叠条形图，标签位于右端。

检查是否在R中已存在一个图表？

在已编织的HTML文档中返回一个包含多个DT::datatables的列表。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。