如何将存储在数据表列中的数据框列的类型转换为数据表列。

huangapple go评论65阅读模式
英文:

How to convert the type of a column stored in a data.frame within a data.table column

问题

你可以在使用 rbindlist 之前,通过 lapply 或者 sapply 来将 data.frames 中的列转换为相同的类型。以下是一个示例代码,将所有列都转换为 double 类型:

library(data.table)

# 将列转换为double类型
DT[, fund_metrics := lapply(fund_metrics, function(df) as.data.frame(lapply(df, as.double)))]

# 使用rbindlist合并data.frames
resultDT <- DT[, rbindlist(fund_metrics), by = .(display_name, reporting_currency)]

# 添加Name和CUR列
resultDT[, c("Name", "CUR") := .(unique(DT$display_name), unique(DT$reporting_currency))]

# 查看结果
print(resultDT)

这段代码首先使用 lapply 将每个 data.frame 中的列转换为 double 类型,然后使用 rbindlist 合并它们,最后添加了 NameCUR 列。这样,你就能得到一个包含所有信息的 data.table,并确保所有列都是相同类型的。

英文:

I have a list of data.frames stored within a data.table column:

library(data.table)
DT &lt;- structure(list(display_name = c(&quot;Entity 1&quot;, &quot;Entity 2&quot;), reporting_currency = c(&quot;USD&quot;, 
                                      &quot;USD&quot;), fund_metrics = list(structure(list(metric_category = c(&quot;Partners&#39; Capital&quot;, 
     &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, 
     &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, 
     &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, 
     &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, 
     &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;), labeled_as = c(&quot;Total Partners&#39; Capital&quot;, 
               &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, 
               &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, 
               &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, 
               &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, 
               &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;), reporting_sign = c(1L, 
                                         1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), value = c(589933611, 
            5168, 49489, 49686, 59470, 72353, 232288, 
            28767, 1190516, 17154, 372091, 30719, 3472, 
            12634, 9528), date = c(&quot;2020-03-31&quot;, &quot;2019-12-31&quot;, 
                                             &quot;2019-09-30&quot;, &quot;2019-06-30&quot;, &quot;2020-06-30&quot;, &quot;2020-09-30&quot;, &quot;2020-12-31&quot;, 
                                             &quot;2021-03-31&quot;, &quot;2022-06-30&quot;, &quot;2022-03-31&quot;, &quot;2021-12-31&quot;, &quot;2021-09-30&quot;, 
                                             &quot;2021-06-30&quot;, &quot;2022-09-30&quot;, &quot;2022-12-31&quot;)), row.names = c(NA, 
       15L), class = &quot;data.frame&quot;), structure(list(metric_category = c(&quot;Partners&#39; Capital&quot;, 
                       &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, 
                       &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, 
                       &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, &quot;Partners&#39; Capital&quot;, 
                       &quot;Partners&#39; Capital&quot;), labeled_as = c(&quot;Total Partners&#39; Capital&quot;, 
            &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, 
            &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, 
            &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, &quot;Total Partners&#39; Capital&quot;, 
            &quot;Total Partners&#39; Capital&quot;), reporting_sign = c(1L, 1L, 1L, 1L, 
           1L, 1L, 1L, 1L, 1L, 1L, 1L), value = c(1130L, 173202L, 
  53830L, 66257L, 15L, 67968L, 6639L, 1097127L, 
  9499L, 5211L, 40217L), date = c(&quot;2020-06-30&quot;, &quot;2020-09-30&quot;, 
                                             &quot;2020-12-31&quot;, &quot;2019-12-31&quot;, &quot;2019-06-30&quot;, &quot;2019-09-30&quot;, &quot;2021-03-31&quot;, 
                                             &quot;2020-03-31&quot;, &quot;2021-12-31&quot;, &quot;2022-03-31&quot;, &quot;2022-06-30&quot;)), row.names = c(NA, 
                     11L), class = &quot;data.frame&quot;))), row.names = c(NA, -2L), class = c(&quot;data.table&quot;, 
                                      &quot;data.frame&quot;))

This is how it looks:

 DT
   display_name reporting_currency       fund_metrics
1:     Entity 1                USD &lt;data.frame[15x5]&gt;
2:     Entity 2                USD &lt;data.frame[11x5]&gt;

My desired output is to expand the data.frames and keep the information in columns display_name and reporting_currency and store this all in one data.table. The following loop achieves this result:

resDT &lt;- as.data.table(DT[1, fund_metrics])
resDT[, &quot;Name&quot;:=list(DT[1, display_name])]
resDT[, &quot;CUR&quot; :=list(DT[1, reporting_currency])]
for (i in 2:nrow(DT)) {
  
  intDT &lt;- as.data.table(DT[i, fund_metrics])
  if (nrow(intDT)&gt;0) {
    intDT[, &quot;Name&quot;:=list(DT[i, display_name])]
    intDT[, &quot;CUR&quot; :=list(DT[i, reporting_currency])]  
  }
  resDT &lt;- rbind(resDT, intDT)
  
}
resDT
      metric_category              labeled_as reporting_sign     value       date     Name CUR
 1: Partners&#39; Capital Total Partners&#39; Capital              1 589933611 2020-03-31 Entity 1 USD
 2: Partners&#39; Capital Total Partners&#39; Capital              1      5168 2019-12-31 Entity 1 USD
 3: Partners&#39; Capital Total Partners&#39; Capital              1     49489 2019-09-30 Entity 1 USD
 4: Partners&#39; Capital Total Partners&#39; Capital              1     49686 2019-06-30 Entity 1 USD
 5: Partners&#39; Capital Total Partners&#39; Capital              1     59470 2020-06-30 Entity 1 USD
...

This didn't feel very data.table efficient to me and I found a much better way of doing it on Stackoverflow.

However, this solution doesn't work for me because the columns in the data.frames do not have the same types. I get the following error:

DT[, rbindlist(fund_metrics),by=list(display_name, reporting_currency)]
Error in `[.data.table`(DT, , rbindlist(fund_metrics), by = list(display_name,  : 
  Column 4 of result for group 2 is type &#39;integer&#39; but expecting type &#39;double&#39;. Column types must be consistent for each group.

I don't quite understand why rbindlist doesn't coerce the integer to double, as the help page reads:

> If column i does not have the same type in each of the list items; e.g, the column is integer in item 1 while others are numeric, they are coerced to the highest type.

I assume double is something else than integer/numeric, but not sure. My question though is if there is an efficient way to change the column types in the data.frames nested within the data.table before using rbindlist.

答案1

得分: 1

可以尝试使用tidyr包中的unnest函数:

library(tidyr)
DT %>%
  unnest(fund_metrics)

如果想要使用data.table并仅使用rbindlist,可以按以下方式使用rbindlist(不应使用by):

copy(DT)[
  ,
  id := .I
][
  ,
  fund_metrics := NULL
][DT[
  ,
  rbindlist(fund_metrics, idcol = "id")
], on = "id"][
  ,
  id := NULL
][]
英文:

You can try unnest from package tidyr

library(tidyr)
DT %&gt;%
  unnest(fund_metrics)

which gives

# A tibble: 26 &#215; 7
   display_name reporting_currency metric_category   labeled_as   reporting_sign
   &lt;chr&gt;        &lt;chr&gt;              &lt;chr&gt;             &lt;chr&gt;                 &lt;int&gt;
 1 Entity 1     USD                Partners&#39; Capital Total Partn…              1
 2 Entity 1     USD                Partners&#39; Capital Total Partn…              1
 3 Entity 1     USD                Partners&#39; Capital Total Partn…              1
 4 Entity 1     USD                Partners&#39; Capital Total Partn…              1
 5 Entity 1     USD                Partners&#39; Capital Total Partn…              1
 6 Entity 1     USD                Partners&#39; Capital Total Partn…              1
 7 Entity 1     USD                Partners&#39; Capital Total Partn…              1
 8 Entity 1     USD                Partners&#39; Capital Total Partn…              1
 9 Entity 1     USD                Partners&#39; Capital Total Partn…              1
10 Entity 1     USD                Partners&#39; Capital Total Partn…              1
# ℹ 16 more rows
# ℹ 2 more variables: value &lt;dbl&gt;, date &lt;chr&gt;
# ℹ Use `print(n = ...)` to see more rows

If you would like to load data.table and use rbindlist only, you can use rbindlist like below (should use it without by)

copy(DT)[
  ,
  id := .I
][
  ,
  fund_metrics := NULL
][DT[
  ,
  rbindlist(fund_metrics, idcol = &quot;id&quot;)
], on = &quot;id&quot;][
  ,
  id := NULL
][]

which gives

    display_name reporting_currency   metric_category              labeled_as
 1:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
 2:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
 3:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
 4:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
 5:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
 6:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
 7:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
 8:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
 9:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
10:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
11:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
12:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
13:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
14:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
15:     Entity 1                USD Partners&#39; Capital Total Partners&#39; Capital
16:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
17:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
18:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
19:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
20:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
21:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
22:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
23:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
24:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
25:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
26:     Entity 2                USD Partners&#39; Capital Total Partners&#39; Capital
    display_name reporting_currency   metric_category              labeled_as
    reporting_sign     value       date
 1:              1 589933611 2020-03-31
 2:              1      5168 2019-12-31
 3:              1     49489 2019-09-30
 4:              1     49686 2019-06-30
 5:              1     59470 2020-06-30
 6:              1     72353 2020-09-30
 7:              1    232288 2020-12-31
 8:              1     28767 2021-03-31
 9:              1   1190516 2022-06-30
10:              1     17154 2022-03-31
11:              1    372091 2021-12-31
12:              1     30719 2021-09-30
13:              1      3472 2021-06-30
14:              1     12634 2022-09-30
15:              1      9528 2022-12-31
16:              1      1130 2020-06-30
17:              1    173202 2020-09-30
18:              1     53830 2020-12-31
19:              1     66257 2019-12-31
20:              1        15 2019-06-30
21:              1     67968 2019-09-30
22:              1      6639 2021-03-31
23:              1   1097127 2020-03-31
24:              1      9499 2021-12-31
25:              1      5211 2022-03-31
26:              1     40217 2022-06-30
    reporting_sign     value       date

huangapple
  • 本文由 发表于 2023年4月19日 17:24:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76052855.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定