2023年3月7日 10:08:09go评论165阅读模式

英文:

Create valid range from a factor and apply another factor range in R

问题

我有一个包含两列的CSV文件，第一列是表示船舶尺寸范围的因子类别，第二列是属于该尺寸类别的船舶类别。我需要使用这些数据来填充一个新的表格，其中包含不同的已建立的船舶尺寸范围。例如，我的初始原始数据是两列的：

dat.start <- data.frame(category = c(rep("1-10", 3), rep("11-20", 3), rep("21-30", 3), rep("32-40", 3), rep("41-50", 3), rep("51-59", 3)), class = rep(c("a", "b", "c"), 6))

当我按类别对类别进行聚合，例如：

ag.dat <- aggregate(class ~ category, data = dat.start, length)

你会看到我得到一个具有str(ag.dat)结构的数据框，包含字符列和整数列。

接下来的问题是，我需要将这些船舶尺寸的频率分配到一张新的、预定的船舶尺寸类别表中，这些类别与初始数据中的不同。例如，下面是新的尺寸类别和基于原始dat.start数据的船舶类别频率：

dat.end <- data.frame(category = c("1-20", "21-50", ">50"), class = c(6, 9, 3))

所以我的问题是如何从dat.start到dat.end？我的第一个想法是以某种方式拆分字符字符串类别，并创建可以数值解释的新dat.start和dat.end范围，类似于cut函数产生的结果。但当涉及到下一步实际创建基于新类别的船舶类别频率时，我陷入了困境。此外，将字符字符串范围转换为数值范围也让我感到困惑。

我在网上找到的最接近的解决方案是这里，我认为是这个链接；但是这似乎是针对Python/Pandas编写的，而我需要在R中完成。谢谢。

英文:

I have a csv file of 2 columns, first column are factor categories representing vessel size ranges, and the second column is a vessel class that falls into that size category. I need to use these data to then fill in a new table of different established vessel size ranges. Fore example: my initial raw data is in two columns;

dat.start&lt;-data.frame(category=c(rep(&quot;1-10&quot;,3), rep(&quot;11-20&quot;,3), rep(&quot;21-30&quot;,3), rep(&quot;32-40&quot;,3), rep(&quot;41-50&quot;,3), rep(&quot;51-59&quot;,3)), class=rep(c(&quot;a&quot;,&quot;b&quot;,&quot;c&quot;),6))

When I aggregate by class by category eg

ag.dat&lt;-aggregate(class ~ category, data = dat.start, length)

you'll see I get a df with a structure str(ag.dat) consisting of chr column and int column.

The next problem is that I need to assign those frequencies of vessel sizes into a table of new, predetermined vessel size categories that are different from the first. For example, below are the new size categories and the frequency of vessel classes based on the original dat.start data

dat.end&lt;-data.frame(category=c(&quot;1-20&quot;, &quot;21-50&quot;, &quot;&gt;50&quot;), class=c(6, 9, 3))

So my question is how to go from dat.start to dat.end? My first thoughts were to somehow split up the chr string categories and create new dat.start and dat.end ranges that can be numerically interpreted, such as what is produced by cut. But then I drew a blank when it came to going the next step of actually creating the frequencies of vessel class based on new categories. Plus the converting of chr string ranges to numerical ranges also stumped me.

The closest solution I found on the web was here I think; https://stackoverflow.com/questions/71078299/identify-the-matching-range-from-a-list-of-valid-range

but this looks like its written for Python/Pandas and I need to do it in R. Thanks.

答案1

得分: 2

如果你将"category"分为两列，你可以进行数值比较，例如：

library(tidyverse)
dat.start<-data.frame(category=c(rep("1-10",3), rep("11-20",3), rep("21-30",3), rep("32-40",3), rep("41-50",3), rep("51-59",3)), class=rep(c("a","b","c"),6))
dat.start
#>    category class
#> 1      1-10     a
#> 2      1-10     b
#> 3      1-10     c
#> 4     11-20     a
#> 5     11-20     b
#> 6     11-20     c
#> 7     21-30     a
#> 8     21-30     b
#> 9     21-30     c
#> 10    32-40     a
#> 11    32-40     b
#> 12    32-40     c
#> 13    41-50     a
#> 14    41-50     b
#> 15    41-50     c
#> 16    51-59     a
#> 17    51-59     b
#> 18    51-59     c
dat.end<-data.frame(category=c("1-20", "21-50", ">50"), class=c(6, 9, 3))
dat.end
#>   category class
#> 1     1-20     6
#> 2    21-50     9
#> 3      >50     3
dat.start %>%
  separate(category, into = c("min", "max"), sep = "-") %>%
  mutate(category = case_when(max <= 20 ~ "1-20",
                              min > 20 & max <= 50 ~ "21-50",
                              min > 50 ~ ">50")) %>%
  summarise(class = n(), .by = category)
#>   category class
#> 1     1-20     6
#> 2    21-50     9
#> 3      >50     3

或者另一种潜在的方法是使用一个"查找表"，例如：

lookup_table <- setNames(c("1-20", "1-20", "21-50",
                           "21-50", "21-50", ">50"),
                         unique(dat.start$category))
lookup_table
#>    1-10   11-20   21-30   32-40   41-50   51-59 
#>  "1-20"  "1-20" "21-50" "21-50" "21-50"   ">50"
dat.start %>%
  mutate(category = recode(category, !!!lookup_table)) %>%
  summarise(class = n(), .by = category)
#>   category class
#> 1     1-20     6
#> 2    21-50     9
#> 3      >50     3

^{创建于2023-03-07，使用 reprex v2.0.2}

有很多不同的方法可以使用查找表来完成这种任务，详细方法和示例请参见这里。

英文:

If you separate "category" into two columns you can make numerical comparisons, e.g.

library(tidyverse)
dat.start&lt;-data.frame(category=c(rep(&quot;1-10&quot;,3), rep(&quot;11-20&quot;,3), rep(&quot;21-30&quot;,3), rep(&quot;32-40&quot;,3), rep(&quot;41-50&quot;,3), rep(&quot;51-59&quot;,3)), class=rep(c(&quot;a&quot;,&quot;b&quot;,&quot;c&quot;),6))
dat.start
#&gt;    category class
#&gt; 1      1-10     a
#&gt; 2      1-10     b
#&gt; 3      1-10     c
#&gt; 4     11-20     a
#&gt; 5     11-20     b
#&gt; 6     11-20     c
#&gt; 7     21-30     a
#&gt; 8     21-30     b
#&gt; 9     21-30     c
#&gt; 10    32-40     a
#&gt; 11    32-40     b
#&gt; 12    32-40     c
#&gt; 13    41-50     a
#&gt; 14    41-50     b
#&gt; 15    41-50     c
#&gt; 16    51-59     a
#&gt; 17    51-59     b
#&gt; 18    51-59     c
dat.end&lt;-data.frame(category=c(&quot;1-20&quot;, &quot;21-50&quot;, &quot;&gt;50&quot;), class=c(6, 9, 3))
dat.end
#&gt;   category class
#&gt; 1     1-20     6
#&gt; 2    21-50     9
#&gt; 3      &gt;50     3
dat.start %&gt;%
  separate(category, into = c(&quot;min&quot;, &quot;max&quot;), sep = &quot;-&quot;) %&gt;%
  mutate(category = case_when(max &lt;= 20 ~ &quot;1-20&quot;,
                              min &gt; 20 &amp; max &lt;= 50 ~ &quot;21-50&quot;,
                              min &gt; 50 ~ &quot;&gt;50&quot;)) %&gt;%
  summarise(class = n(), .by = category)
#&gt;   category class
#&gt; 1     1-20     6
#&gt; 2    21-50     9
#&gt; 3      &gt;50     3

Or another potential approach is to use a 'look up' table, e.g.

lookup_table &lt;- setNames(c(&quot;1-20&quot;, &quot;1-20&quot;, &quot;21-50&quot;,
                           &quot;21-50&quot;, &quot;21-50&quot;, &quot;&gt;50&quot;),
                         unique(dat.start$category))
lookup_table
#&gt;    1-10   11-20   21-30   32-40   41-50   51-59 
#&gt;  &quot;1-20&quot;  &quot;1-20&quot; &quot;21-50&quot; &quot;21-50&quot; &quot;21-50&quot;   &quot;&gt;50&quot;
dat.start %&gt;%
  mutate(category = recode(category, !!!lookup_table)) %&gt;%
  summarise(class = n(), .by = category)
#&gt;   category class
#&gt; 1     1-20     6
#&gt; 2    21-50     9
#&gt; 3      &gt;50     3

<sup>Created on 2023-03-07 with reprex v2.0.2</sup>

There are many different ways to use a lookup table for this type of task, see https://stackoverflow.com/questions/67081496/canonical-tidyverse-method-to-update-some-values-of-a-vector-from-a-look-up-tabl for more methods / examples

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中，从一个因子创建有效范围，然后应用另一个因子范围。

问题

答案1

在使用ggplot绘制地图上的多个物种时，您可以使用以下代码：

Read an excel file with separate range of cells.

按照定义的间隔对一列进行分组和汇总。

可以移除代码前面的所有’>’吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。