2023年6月13日 12:47:33go评论98阅读模式

英文:

match values in both row and columns from another dataframe

问题

I understand your request, but let's first clarify what you want to do in code. You'd like to create a new column in the "d" dataframe. This new column should be populated with values from the "L" dataframe based on the values in the "quantity" column of "d" and the corresponding "fli" values. Is that correct?

英文:

    L &lt;- c(0, 500, 1000, 2000, 3000, 5000, 10000, 20000, 50000);
fli.1 &lt;- c(0, 0.1, 0.2, 0.4, 0.8, 0.9, 1, 1.2, 1.8);
fli.2 &lt;- c(0, 0.11, 0.21, 0.42, 0.84, 0.95, 1.05, 1.26, 1.89);
fli.3 &lt;- c(0, 0.11, 0.22, 0.44, 0.88, 0.99, 1.1, 1.32, 1.98);
fli.4 &lt;- c(0, 0.12, 0.23, 0.46, 0.93, 1.04, 1.16, 1.39, 2.08);
fli.5 &lt;- c(0, 0.12, 0.24, 0.49, 0.97, 1.09, 1.22, 1.46, 2.19);
data &lt;- data.frame(L, fli.1, fli.2, fli.3, fli.4, fli.5);
d &lt;- data.frame(quantity = c(300, 368, 568, 20, 1000, 37659, 45000, 2500, 4500, 78453, 1200, 1589), fli = c(&quot;fli.1&quot;, &quot;fli.1&quot;, &quot;fli.4&quot;, &quot;fli.5&quot;, &quot;fli.2&quot;, &quot;fli.2&quot;, &quot;fli.5&quot;, &quot;fli.1&quot;, &quot;fli.2&quot;, &quot;fli.2&quot;, &quot;fli.3&quot;, &quot;fli.4&quot;));

i need to create another column in the dataframe d such that for each of its entry it takes value from the table L.
it should select row which is less than the quantity.
it should select column based on the fli.
for e.g. 37659 it would be 8th row and 2nd column which is 1.26.

I have tried using matrix, but it takes too much time. note that it is sample data i need to apply it to a very large dataset.

答案1

得分: 1

以下是您要翻译的内容：

"如 L 在 data 中已排序，您可以使用 findInterval 获取行，使用 match 获取列，然后使用 cbind 组合这些索引并用它们来子集 data。

d$value &lt;-
  data[cbind(findInterval(d$quantity, data$L), match(d$fli, names(data)))]
d
#   quantity   fli value
#1       300 fli.1  0.00
#2       368 fli.1  0.00
#3       568 fli.4  0.12
#4        20 fli.5  0.00
#5      1000 fli.2  0.21
#6     37659 fli.2  1.26
#7     45000 fli.5  1.46
#8      2500 fli.1  0.40
#9      4500 fli.2  0.84
#10    78453 fli.2  1.89
#11     1200 fli.3  0.22
#12     1589 fli.4  0.23
```"
---
"性能测试
```R
library(dplyr)
bench::mark(check=FALSE,
&quot;Jon Spring&quot; = {d |&gt;
  left_join(tidyr::pivot_longer(data, -L, names_to = &quot;fli&quot;), 
            join_by(fli,  closest(quantity &gt;= L)))},
GKi = cbind(d, value=data[cbind(findInterval(d$quantity, data$L), match(d$fli, names(data)))]))
#  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#  &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt;     &lt;dbl&gt; &lt;bch:byt&gt;    &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;
#1 Jon Spring   5.22ms   5.29ms      188.    21.4KB     8.45    89     4
#2 GKi        198.77&#181;s 207.86&#181;s     4742.      480B    10.3   2313     5
```"
在这种情况下，GKi 比 Jon Spring 快约 25 倍，并且分配的内存较少。
<details>
<summary>英文:</summary>
As *L* is sorted in *data* you can use `findInterval` to get the row and `match` for the column, `cbind` the indices and use them to subset *data*.

d$value <-
data[cbind(findInterval(d$quantity, data$L), match(d$fli, names(data)))]

quantity fli value

#1 300 fli.1 0.00
#2 368 fli.1 0.00
#3 568 fli.4 0.12
#4 20 fli.5 0.00
#5 1000 fli.2 0.21
#6 37659 fli.2 1.26
#7 45000 fli.5 1.46
#8 2500 fli.1 0.40
#9 4500 fli.2 0.84
#10 78453 fli.2 1.89
#11 1200 fli.3 0.22
#12 1589 fli.4 0.23

---
Benchmark

library(dplyr)

bench::mark(check=FALSE,
"Jon Spring" = {d |>
left_join(tidyr::pivot_longer(data, -L, names_to = "fli"),
join_by(fli, closest(quantity >= L)))},
GKi = cbind(d, value=data[cbind(findInterval(d$quantity, data$L), match(d$fli, names(data)))])
)

expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc

<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>

#1 Jon Spring 5.22ms 5.29ms 188. 21.4KB 8.45 89 4
#2 GKi 198.77µs 207.86µs 4742. 480B 10.3 2313 5

In this case GKi is about 25 times faster than Jon Spring and allocates less memory.
</details>
# 答案2
**得分**: 0
从我之前回答你的问题中进行修改，这似乎是一个小的修订。（如果你意识到问题不清楚或不是你想要问的内容，及时编辑你的问题是最佳实践。）
```R
library(dplyr) # v1.1.0+
d |>
  left_join(pivot_longer(data, -L, names_to = "fli"), 
            join_by(fli,  closest(quantity >= L)))

结果

   quantity   fli     L value
1       300 fli.1     0  0.00
2       368 fli.1     0  0.00
3       568 fli.4   500  0.12
4        20 fli.5     0  0.00
5      1000 fli.2  1000  0.21
6     37659 fli.2 20000  1.26
7     45000 fli.5 20000  1.46
8      2500 fli.1  2000  0.40
9      4500 fli.2  3000  0.84
10    78453 fli.2 50000  1.89
11     1200 fli.3  1000  0.22
12     1589 fli.4  1000  0.23

英文:

Modifying from my answer to your prior question to which this seems like a minor revision. (It's best practice to promptly edit your question if you realize it's unclear or not what you meant to ask.)

library(dplyr) # v1.1.0+
d |&gt;
  left_join(pivot_longer(data, -L, names_to = &quot;fli&quot;), 
            join_by(fli,  closest(quantity &gt;= L)))

Result

   quantity   fli     L value
1       300 fli.1     0  0.00
2       368 fli.1     0  0.00
3       568 fli.4   500  0.12
4        20 fli.5     0  0.00
5      1000 fli.2  1000  0.21
6     37659 fli.2 20000  1.26
7     45000 fli.5 20000  1.46
8      2500 fli.1  2000  0.40
9      4500 fli.2  3000  0.84
10    78453 fli.2 50000  1.89
11     1200 fli.3  1000  0.22
12     1589 fli.4  1000  0.23

答案3

得分: 0

以下是翻译好的部分：

这里，我提供了一个稍微不同的方法。

首先，我将“data”从宽格式转换为长格式，并重命名列以便后续合并表格时使用。

library(reshape2)
data_DT <- melt(data, id = "L")
names(data_DT) <- c("L", "fli", "value")

然后，我将“quantity”分成不同组，使用基于“data”中的“L”值的分割点。这些组将类似于“(0,500]”、“(500,1000]”等。使用简单的正则表达式匹配，我可以获得下限值，这将用于与第一个表格合并。

library(data.table)
d_DT <- data.table(d)
d_DT[, quantity_group := cut(quantity, c(data[, "L"], Inf))]
d_DT[, L := as.numeric(gsub("^.","",gsub(",.*","",quantity_group)))]
d_DT <- merge(d_DT, data_DT, by = c("L", "fli"))

英文:

Here, I provide a slightly different approach.

First, I changed the data from wide format to long format, and renamed the columns for merging the tables later.

library(reshape2)
data_DT &lt;- melt(data, id = &quot;L&quot;)
names(data_DT) &lt;- c(&quot;L&quot;, &quot;fli&quot;, &quot;value&quot;)

Then, I divide the quantity into groups, using breakpoints based on the values of L in data. The groups will be something like (0,500], (500,1000], and so on. Using simple regex matching, I can then obtain the value of the lower bound, this will be used to merge with the first table.

library(data.table)
d_DT &lt;- data.table(d)
d_DT[, quantity_group := cut(quantity, c(data[, &quot;L&quot;], Inf))]
d_DT[, L := as.numeric(gsub(&quot;^.&quot;, &quot;&quot;, gsub(&quot;,.*&quot;, &quot;&quot;, quantity_group)))]
d_DT &lt;- merge(d_DT, data_DT, by = c(&quot;L&quot;, &quot;fli&quot;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在另一个数据框中匹配行和列中的数值。

问题

答案1

quantity fli value

expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc

<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>

答案3

将列求和，然后除以相邻的单元格。

在R中指定DataTables中隐藏列的顺序。

Too many values to unpack – Pandas DataFrame

如何在ggsurvplot图例中删除自动生成的“strata”文本？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

答案1

quantity fli value

expression min median itr/sec mem_alloc gc/sec n_itr n_gc

<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>

答案3

发表评论

expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc