2023年2月8日 23:14:06go评论84阅读模式

英文:

fill sequence of scaled numbers in r

问题

我试图使用标度化分数来完成一个数据框。首先，我有一组与等级相关的分数，以及计算出的一个通用分数。

library(dplyr)
df <- tibble(grade = c("X", "E", "D", "C", "B", "A", "Max"),
             score = c(0,17,25,33,41,48,60),
             universal = c(0,22,44,65,87,108,108))

我将数据框扩展以包括所有分数的整数值

df %>% complete(score = full_seq(score, period = 1)) %>% 
  fill(grade, .direction = "down")

现在，我想根据先前定义的每个等级的通用分数之间的相对步骤，完成与每个整数分数相关的通用分数。

这基于一个转换/缩放因子：
(上面等级的通用边界 - 下面等级的通用边界) / (上面等级的分数边界 - 下面等级的分数边界)
对于等级 U，这将是 (22-0) / (17-0) = 1.29。将每个先前的分数与此因子相加，以找到相应的下一个通用分数。

因此，结果的第一部分应该如下所示：

score  grade   universal
0      U       0
1      U       1.29
2      U       2.59
3      U       3.88
4      U       5.18
5      U       6.47
6      U       7.76
7      U       9.06
8      U       10.35
9      U       11.65
10     U       12.94
11     U       14.24
12     U       15.53
13     U       16.82
14     U       18.12
15     U       19.41
16     U       20.71
17     N       22.00

我试图使用 Tidy 原则和各种组合的 group_by()、complete()、seq() 等来实现这一点，但无法以一种简洁的方式实现。我认为我的问题在于我的最大值在分组变量之外。任何帮助将不胜感激。

英文:

I'm trying to complete a data.frame with scaled scores.
First I have a set of scores that relate to a grade, and a universal score that has been calculated.

library(dplyr)
df &lt;- tibble(grade = c(&quot;X&quot;, &quot;E&quot;, &quot;D&quot;, &quot;C&quot;, &quot;B&quot;, &quot;A&quot;, &quot;Max&quot;),
             score = c(0,17,25,33,41,48,60),
             universal = c(0,22,44,65,87,108,108))

I expand the frame to include all integer values of score

df %&gt;% complete(score = full_seq(score, period = 1)) %&gt;% 
  fill(grade, .direction = &quot;down&quot;)

I now want to complete the universal score that relates to each integer score based on the relative steps between the previously defined universal scores for each grade.

This is based on a conversion/scaling factor:
(universal boundary for grade above - universal boundary below)/(score boundary grade above - score boundary grade below)
For the grade U this would be (22-0)/(17-0) = 1.29. Each previous score is summed with this factor to find the corresponding next universal score.

So the first part of the result should look like this:

score  grade   universal
0	U	0
1	U	1.29
2	U	2.59
3	U	3.88
4	U	5.18
5	U	6.47
6	U	7.76
7	U	9.06
8	U	10.35
9	U	11.65
10	U	12.94
11	U	14.24
12	U	15.53
13	U	16.82
14	U	18.12
15	U	19.41
16	U	20.71
17	N	22.00

I'm trying to achieve this with Tidy principles and various combinations of group_by(), complete(), seq(), etc., but haven't been able to achieve it in a neat way. I think my problem is that my max value is outside the grouping variable.
Any help will be much appreciated.

答案1

得分: 2

Base R提供了approx函数来执行线性插值。您可以在tidyverse的上下文中像这样使用它：

df %>%
  complete(score = full_seq(score, period = 1)) %>%
  fill(grade, .direction = "down") %>%
  mutate(universal = approx(x=score, y=universal, xout=score)$y)

一个tibble: 61 × 3

score grade universal

1 0 X 0
2 1 X 1.29
3 2 X 2.59
4 3 X 3.88
5 4 X 5.18
6 5 X 6.47
7 6 X 7.76
8 7 X 9.06
9 8 X 10.4
10 9 X 11.6


这是代码部分的翻译。
<details>
<summary>英文:</summary>
Base R has the `approx` function to do this linear interpolation.  You can use it in a tidyverse context like this:

df %>%
complete(score = full_seq(score, period = 1)) %>%
fill(grade, .direction = "down") %>%
mutate(universal = approx(x=score,y=universal,xout=score)$y)

A tibble: 61 × 3

score grade universal
<dbl> <chr> <dbl>
1 0 X 0
2 1 X 1.29
3 2 X 2.59
4 3 X 3.88
5 4 X 5.18
6 5 X 6.47
7 6 X 7.76
8 7 X 9.06
9 8 X 10.4
10 9 X 11.6


</details>
# 答案2
**得分**: 1

df %>% mutate(
inc = c(diff(universal) / diff(score), NA)
) %>%
complete(score = full_seq(score, period = 1)) %>%
fill(grade, inc, .direction = "down") %>%
group_by(grade) %>%
mutate(universal = first(universal) + (row_number() - 1) * inc) %>%
ungroup() %>%
print(n = 30)

# A tibble: 61 × 4

score grade universal inc

<dbl> <chr> <dbl> <dbl>

1 0 X 0 1.29

2 1 X 1.29 1.29

3 2 X 2.59 1.29

4 3 X 3.88 1.29

5 4 X 5.18 1.29

6 5 X 6.47 1.29

7 6 X 7.76 1.29

8 7 X 9.06 1.29

9 8 X 10.4 1.29

10 9 X 11.6 1.29

11 10 X 12.9 1.29

12 11 X 14.2 1.29

13 12 X 15.5 1.29

14 13 X 16.8 1.29

15 14 X 18.1 1.29

16 15 X 19.4 1.29

17 16 X 20.7 1.29

18 17 E 22 2.75

19 18 E 24.8 2.75

20 19 E 27.5 2.75

21 20 E 30.2 2.75

22 21 E 33 2.75

23 22 E 35.8 2.75

24 23 E 38.5 2.75

25 24 E 41.2 2.75

26 25 D 44 2.62

27 26 D 46.6 2.62

28 27 D 49.2 2.62

29 28 D 51.9 2.62

30 29 D 54.5 2.62

# … with 31 more rows

# ℹ Use `print(n = ...)` to see more rows


<details>
<summary>英文:</summary>

# A tibble: 61 × 4

score grade universal inc

<dbl> <chr> <dbl> <dbl>

1 0 X 0 1.29

2 1 X 1.29 1.29

3 2 X 2.59 1.29

4 3 X 3.88 1.29

5 4 X 5.18 1.29

6 5 X 6.47 1.29

7 6 X 7.76 1.29

8 7 X 9.06 1.29

9 8 X 10.4 1.29

10 9 X 11.6 1.29

11 10 X 12.9 1.29

12 11 X 14.2 1.29

13 12 X 15.5 1.29

14 13 X 16.8 1.29

15 14 X 18.1 1.29

16 15 X 19.4 1.29

17 16 X 20.7 1.29

18 17 E 22 2.75

19 18 E 24.8 2.75

20 19 E 27.5 2.75

21 20 E 30.2 2.75

22 21 E 33 2.75

23 22 E 35.8 2.75

24 23 E 38.5 2.75

25 24 E 41.2 2.75

26 25 D 44 2.62

27 26 D 46.6 2.62

28 27 D 49.2 2.62

29 28 D 51.9 2.62

30 29 D 54.5 2.62

# … with 31 more rows

# ℹ Use `print(n = ...)` to see more rows


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

问题

答案1

一个tibble: 61 × 3

A tibble: 61 × 3

# A tibble: 61 × 4

score grade universal inc

<dbl> <chr> <dbl> <dbl>

1 0 X 0 1.29

2 1 X 1.29 1.29

3 2 X 2.59 1.29

4 3 X 3.88 1.29

5 4 X 5.18 1.29

6 5 X 6.47 1.29

7 6 X 7.76 1.29

8 7 X 9.06 1.29

9 8 X 10.4 1.29

10 9 X 11.6 1.29

11 10 X 12.9 1.29

12 11 X 14.2 1.29

13 12 X 15.5 1.29

14 13 X 16.8 1.29

15 14 X 18.1 1.29

16 15 X 19.4 1.29

17 16 X 20.7 1.29

18 17 E 22 2.75

19 18 E 24.8 2.75

20 19 E 27.5 2.75

21 20 E 30.2 2.75

22 21 E 33 2.75

23 22 E 35.8 2.75

24 23 E 38.5 2.75

25 24 E 41.2 2.75

26 25 D 44 2.62

27 26 D 46.6 2.62

28 27 D 49.2 2.62

29 28 D 51.9 2.62

30 29 D 54.5 2.62

# … with 31 more rows

# ℹ Use print(n = ...) to see more rows

# A tibble: 61 × 4

score grade universal inc

<dbl> <chr> <dbl> <dbl>

1 0 X 0 1.29

2 1 X 1.29 1.29

3 2 X 2.59 1.29

4 3 X 3.88 1.29

5 4 X 5.18 1.29

6 5 X 6.47 1.29

7 6 X 7.76 1.29

8 7 X 9.06 1.29

9 8 X 10.4 1.29

10 9 X 11.6 1.29

11 10 X 12.9 1.29

12 11 X 14.2 1.29

13 12 X 15.5 1.29

14 13 X 16.8 1.29

15 14 X 18.1 1.29

16 15 X 19.4 1.29

17 16 X 20.7 1.29

18 17 E 22 2.75

19 18 E 24.8 2.75

20 19 E 27.5 2.75

21 20 E 30.2 2.75

22 21 E 33 2.75

23 22 E 35.8 2.75

24 23 E 38.5 2.75

25 24 E 41.2 2.75

26 25 D 44 2.62

27 26 D 46.6 2.62

28 27 D 49.2 2.62

29 28 D 51.9 2.62

30 29 D 54.5 2.62

# … with 31 more rows

# ℹ Use print(n = ...) to see more rows

发表评论

# ℹ Use `print(n = ...)` to see more rows

# ℹ Use `print(n = ...)` to see more rows