2023年2月10日 04:21:22go评论97阅读模式

英文:

How to subtract the value of some rows from other rows in R according to categorical variables

问题

我正在尝试确定数据集中每个文本（filename）的每种时态和语态（TA）中动词（n）的数量。我将数据保存为tibble（如下所示），但是n列中的值还不准确，因为某些类别包含其他类别。例如，为了准确计算past_simple，我需要从同一文本中的past_perfect和present_progressive的数量（n）中减去。基本上，我想要做这样的事情：

从filename BIO.GO.01.1的past_simple的值中减去past_perfect和past_progressive的值（n）。
对每个单独的文件重复此过程。

我知道如何根据它们的位置从其他行中减去行，就像这样：

tib[3, 9] <- tib[3, 9] - tib[2, 9] - tib[1, 9]

但是行并不总是按可预测的顺序出现，因为并非每个文本（filename）中都包含所有的TA选项。我也不确定如何编写代码，以便在每次遇到新的filename时重新开始此过程。

我仍在学习如何在R中操作数据。任何建议将不胜感激！

英文:

I am trying to determine the number of verbs (n) in each tense and aspect (TA) for each text (filename) in my dataset. I have my data saved as a tibble (see below), but the values in the n column are not yet accurate because some categories subsume others. For instance, to get an accurate count of past_simple, I need to subtract the number (n) for past_perfect and present_progressive in that same text. Essentially, I am looking to do something like this:

Subtract the value (n) of past_perfect and past_progressive from the value of past_simple for filename BIO.GO.01.1
Repeat this process for each individual file

    tib &lt;- structure(
  list(
    TA = c(
      &quot;past_perfect&quot;,
      &quot;past_progressive&quot;,
      &quot;past_simple&quot;,
      &quot;past_simple&quot;,
      &quot;past_simple&quot;,
      &quot;past_simple&quot;,
      &quot;past_simple&quot;,
      &quot;past_simple&quot;,
      &quot;past_perfect&quot;,
      &quot;past_progressive&quot;
    ),
    tense = c(
      &quot;past&quot;,
      &quot;past&quot;,
      &quot;past&quot;,
      &quot;past&quot;,
      &quot;past&quot;,
      &quot;past&quot;,
      &quot;past&quot;,
      &quot;past&quot;,
      &quot;past&quot;,
      &quot;past&quot;
    ),
    aspect = c(
      &quot;perfect&quot;,
      &quot;progressive&quot;,
      &quot;simple&quot;,
      &quot;simple&quot;,
      &quot;simple&quot;,
      &quot;simple&quot;,
      &quot;simple&quot;,
      &quot;simple&quot;,
      &quot;perfect&quot;,
      &quot;progressive&quot;
    ),
    filename = c(
      &quot;BIO.G0.01.1&quot;,
      &quot;BIO.G0.01.1&quot;,
      &quot;BIO.G0.01.1&quot;,
      &quot;BIO.G0.02.1&quot;,
      &quot;BIO.G0.02.2&quot;,
      &quot;BIO.G0.02.4&quot;,
      &quot;BIO.G0.02.5&quot;,
      &quot;BIO.G0.02.6&quot;,
      &quot;BIO.G0.03.1&quot;,
      &quot;BIO.G0.03.1&quot;
    ),
    discipline = c(
      &quot;BIO&quot;,
      &quot;BIO&quot;,
      &quot;BIO&quot;,
      &quot;BIO&quot;,
      &quot;BIO&quot;,
      &quot;BIO&quot;,
      &quot;BIO&quot;,
      &quot;BIO&quot;,
      &quot;BIO&quot;,
      &quot;BIO&quot;
    ),
    nativeness = c(&quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;,
                   &quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;),
    year = c(&quot;G0&quot;,
             &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;),
    gender = c(&quot;F&quot;,
               &quot;F&quot;, &quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;M&quot;, &quot;M&quot;, &quot;M&quot;, &quot;F&quot;, &quot;F&quot;),
    n = c(2L, 2L,
          57L, 39L, 3L, 4L, 49L, 103L, 1L, 1L)
  ),
  class = c(&quot;grouped_df&quot;,
            &quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;),
  row.names = c(NA,-10L),
  groups = structure(
    list(
      filename = c(
        &quot;BIO.G0.01.1&quot;,
        &quot;BIO.G0.02.1&quot;,
        &quot;BIO.G0.02.2&quot;,
        &quot;BIO.G0.02.4&quot;,
        &quot;BIO.G0.02.5&quot;,
        &quot;BIO.G0.02.6&quot;,
        &quot;BIO.G0.03.1&quot;
      ),
      discipline = c(&quot;BIO&quot;, &quot;BIO&quot;, &quot;BIO&quot;, &quot;BIO&quot;, &quot;BIO&quot;, &quot;BIO&quot;,
                     &quot;BIO&quot;),
      nativeness = c(&quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;, &quot;NS&quot;,
                     &quot;NS&quot;),
      year = c(&quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;, &quot;G0&quot;),
      gender = c(&quot;F&quot;, &quot;M&quot;, &quot;M&quot;, &quot;M&quot;, &quot;M&quot;, &quot;M&quot;, &quot;F&quot;),
      .rows = structure(
        list(1:3, 4L, 5L, 6L, 7L, 8L, 9:10),
        ptype = integer(0),
        class = c(&quot;vctrs_list_of&quot;,
                  &quot;vctrs_vctr&quot;, &quot;list&quot;)
      )
    ),
    class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;),
    row.names = c(NA,-7L),
    .drop = TRUE
  )
)

I know how to subtract rows from other rows based on their position, like this:

tib[3, 9] &lt;- tib[3, 9] - tib[2, 9] - tib[1, 9]

But the rows do not always appear in this predictable order because not all TA options are present in each text (filename). I'm also not sure how to write the code to restart this process again each time it comes across a new filename.

I am still learning how to manipulate data in R. Any suggestions would be very much appreciated!

答案1

得分: 0

根据所展示的逻辑，我们可能需要进行一次分组操作。如果在'TA'中有一些元素缺失，可能更好的做法是在进行连接之前将其重塑为宽格式。

library(dplyr)
library(tidyr)
tib %>%
   ungroup %>%
   select(-tense, -aspect) %>%
   pivot_wider(names_from = TA, values_from = n, values_fill = 0) %>%
   mutate(n1 = past_simple - past_progressive - past_perfect,
       TA = 'past_simple', .keep = 'unused') %>%
   left_join(tib %>% ungroup, .) %>%
   mutate(n = coalesce(n1, n), .keep = 'unused')

-output

# A tibble: 10 × 9
   TA               tense aspect      filename    discipline nativeness year  gender     n
   <chr>            <chr> <chr>       <chr>       <chr>      <chr>      <chr> <chr>  <int>
 1 past_perfect     past  perfect     BIO.G0.01.1 BIO        NS         G0    F          2
 2 past_progressive past  progressive BIO.G0.01.1 BIO        NS         G0    F          2
 3 past_simple      past  simple      BIO.G0.01.1 BIO        NS         G0    F         53
 4 past_simple      past  simple      BIO.G0.02.1 BIO        NS         G0    M         39
 5 past_simple      past  simple      BIO.G0.02.2 BIO        NS         G0    M          3
 6 past_simple      past  simple      BIO.G0.02.4 BIO        NS         G0    M          4
 7 past_simple      past  simple      BIO.G0.02.5 BIO        NS         G0    M         49
 8 past_simple      past  simple      BIO.G0.02.6 BIO        NS         G0    M        103
 9 past_perfect     past  perfect     BIO.G0.03.1 BIO        NS         G0    F          1
10 past_progressive past  progressive BIO.G0.03.1 BIO        NS         G0    F          1

请注意，上述代码和输出是保持原样的，没有进行中文翻译。

英文:

Based on the logic showed, we may need a group by operation. It may be better to reshape to wide before doing a join if there are some elements in 'TA' missing

library(dplyr)
library(tidyr)
tib %&gt;%
   ungroup %&gt;% 
   select(-tense, -aspect) %&gt;%
   pivot_wider(names_from = TA, values_from = n, values_fill = 0) %&gt;% 
   mutate(n1 = past_simple - past_progressive - past_perfect,  
       TA = &#39;past_simple&#39;, .keep = &#39;unused&#39;) %&gt;% 
   left_join(tib %&gt;% ungroup, .) %&gt;% 
   mutate(n = coalesce(n1, n), .keep = &#39;unused&#39;)

-output

# A tibble: 10 &#215; 9
   TA               tense aspect      filename    discipline nativeness year  gender     n
   &lt;chr&gt;            &lt;chr&gt; &lt;chr&gt;       &lt;chr&gt;       &lt;chr&gt;      &lt;chr&gt;      &lt;chr&gt; &lt;chr&gt;  &lt;int&gt;
 1 past_perfect     past  perfect     BIO.G0.01.1 BIO        NS         G0    F          2
 2 past_progressive past  progressive BIO.G0.01.1 BIO        NS         G0    F          2
 3 past_simple      past  simple      BIO.G0.01.1 BIO        NS         G0    F         53
 4 past_simple      past  simple      BIO.G0.02.1 BIO        NS         G0    M         39
 5 past_simple      past  simple      BIO.G0.02.2 BIO        NS         G0    M          3
 6 past_simple      past  simple      BIO.G0.02.4 BIO        NS         G0    M          4
 7 past_simple      past  simple      BIO.G0.02.5 BIO        NS         G0    M         49
 8 past_simple      past  simple      BIO.G0.02.6 BIO        NS         G0    M        103
 9 past_perfect     past  perfect     BIO.G0.03.1 BIO        NS         G0    F          1
10 past_progressive past  progressive BIO.G0.03.1 BIO        NS         G0    F          1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据分类变量在R中从某些行的值中减去其他行的值

问题

答案1

如何在R中根据列中的条件填充NA行

如何在使用rnaturalearth包时显示一个国家地图的州？

在R中使用geom_sf绘制shapefiles时遇到问题。

如何在R中复制生存分析并获得与Stata中获得的完全相同的标准误差？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。