英文:
How to convert the type of a column stored in a data.frame within a data.table column
问题
你可以在使用 rbindlist
之前,通过 lapply
或者 sapply
来将 data.frames
中的列转换为相同的类型。以下是一个示例代码,将所有列都转换为 double
类型:
library(data.table)
# 将列转换为double类型
DT[, fund_metrics := lapply(fund_metrics, function(df) as.data.frame(lapply(df, as.double)))]
# 使用rbindlist合并data.frames
resultDT <- DT[, rbindlist(fund_metrics), by = .(display_name, reporting_currency)]
# 添加Name和CUR列
resultDT[, c("Name", "CUR") := .(unique(DT$display_name), unique(DT$reporting_currency))]
# 查看结果
print(resultDT)
这段代码首先使用 lapply
将每个 data.frame
中的列转换为 double
类型,然后使用 rbindlist
合并它们,最后添加了 Name
和 CUR
列。这样,你就能得到一个包含所有信息的 data.table
,并确保所有列都是相同类型的。
英文:
I have a list of data.frames
stored within a data.table
column:
library(data.table)
DT <- structure(list(display_name = c("Entity 1", "Entity 2"), reporting_currency = c("USD",
"USD"), fund_metrics = list(structure(list(metric_category = c("Partners' Capital",
"Partners' Capital", "Partners' Capital", "Partners' Capital",
"Partners' Capital", "Partners' Capital", "Partners' Capital",
"Partners' Capital", "Partners' Capital", "Partners' Capital",
"Partners' Capital", "Partners' Capital", "Partners' Capital",
"Partners' Capital", "Partners' Capital"), labeled_as = c("Total Partners' Capital",
"Total Partners' Capital", "Total Partners' Capital", "Total Partners' Capital",
"Total Partners' Capital", "Total Partners' Capital", "Total Partners' Capital",
"Total Partners' Capital", "Total Partners' Capital", "Total Partners' Capital",
"Total Partners' Capital", "Total Partners' Capital", "Total Partners' Capital",
"Total Partners' Capital", "Total Partners' Capital"), reporting_sign = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), value = c(589933611,
5168, 49489, 49686, 59470, 72353, 232288,
28767, 1190516, 17154, 372091, 30719, 3472,
12634, 9528), date = c("2020-03-31", "2019-12-31",
"2019-09-30", "2019-06-30", "2020-06-30", "2020-09-30", "2020-12-31",
"2021-03-31", "2022-06-30", "2022-03-31", "2021-12-31", "2021-09-30",
"2021-06-30", "2022-09-30", "2022-12-31")), row.names = c(NA,
15L), class = "data.frame"), structure(list(metric_category = c("Partners' Capital",
"Partners' Capital", "Partners' Capital", "Partners' Capital",
"Partners' Capital", "Partners' Capital", "Partners' Capital",
"Partners' Capital", "Partners' Capital", "Partners' Capital",
"Partners' Capital"), labeled_as = c("Total Partners' Capital",
"Total Partners' Capital", "Total Partners' Capital", "Total Partners' Capital",
"Total Partners' Capital", "Total Partners' Capital", "Total Partners' Capital",
"Total Partners' Capital", "Total Partners' Capital", "Total Partners' Capital",
"Total Partners' Capital"), reporting_sign = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), value = c(1130L, 173202L,
53830L, 66257L, 15L, 67968L, 6639L, 1097127L,
9499L, 5211L, 40217L), date = c("2020-06-30", "2020-09-30",
"2020-12-31", "2019-12-31", "2019-06-30", "2019-09-30", "2021-03-31",
"2020-03-31", "2021-12-31", "2022-03-31", "2022-06-30")), row.names = c(NA,
11L), class = "data.frame"))), row.names = c(NA, -2L), class = c("data.table",
"data.frame"))
This is how it looks:
DT
display_name reporting_currency fund_metrics
1: Entity 1 USD <data.frame[15x5]>
2: Entity 2 USD <data.frame[11x5]>
My desired output is to expand the data.frames
and keep the information in columns display_name
and reporting_currency
and store this all in one data.table
. The following loop achieves this result:
resDT <- as.data.table(DT[1, fund_metrics])
resDT[, "Name":=list(DT[1, display_name])]
resDT[, "CUR" :=list(DT[1, reporting_currency])]
for (i in 2:nrow(DT)) {
intDT <- as.data.table(DT[i, fund_metrics])
if (nrow(intDT)>0) {
intDT[, "Name":=list(DT[i, display_name])]
intDT[, "CUR" :=list(DT[i, reporting_currency])]
}
resDT <- rbind(resDT, intDT)
}
resDT
metric_category labeled_as reporting_sign value date Name CUR
1: Partners' Capital Total Partners' Capital 1 589933611 2020-03-31 Entity 1 USD
2: Partners' Capital Total Partners' Capital 1 5168 2019-12-31 Entity 1 USD
3: Partners' Capital Total Partners' Capital 1 49489 2019-09-30 Entity 1 USD
4: Partners' Capital Total Partners' Capital 1 49686 2019-06-30 Entity 1 USD
5: Partners' Capital Total Partners' Capital 1 59470 2020-06-30 Entity 1 USD
...
This didn't feel very data.table
efficient to me and I found a much better way of doing it on Stackoverflow.
However, this solution doesn't work for me because the columns in the data.frames
do not have the same types. I get the following error:
DT[, rbindlist(fund_metrics),by=list(display_name, reporting_currency)]
Error in `[.data.table`(DT, , rbindlist(fund_metrics), by = list(display_name, :
Column 4 of result for group 2 is type 'integer' but expecting type 'double'. Column types must be consistent for each group.
I don't quite understand why rbindlist
doesn't coerce the integer to double, as the help page reads:
> If column i does not have the same type in each of the list items; e.g, the column is integer in item 1 while others are numeric, they are coerced to the highest type.
I assume double is something else than integer/numeric, but not sure. My question though is if there is an efficient way to change the column types in the data.frames
nested within the data.table
before using rbindlist
.
答案1
得分: 1
可以尝试使用tidyr
包中的unnest
函数:
library(tidyr)
DT %>%
unnest(fund_metrics)
如果想要使用data.table
并仅使用rbindlist
,可以按以下方式使用rbindlist
(不应使用by
):
copy(DT)[
,
id := .I
][
,
fund_metrics := NULL
][DT[
,
rbindlist(fund_metrics, idcol = "id")
], on = "id"][
,
id := NULL
][]
英文:
You can try unnest
from package tidyr
library(tidyr)
DT %>%
unnest(fund_metrics)
which gives
# A tibble: 26 × 7
display_name reporting_currency metric_category labeled_as reporting_sign
<chr> <chr> <chr> <chr> <int>
1 Entity 1 USD Partners' Capital Total Partn… 1
2 Entity 1 USD Partners' Capital Total Partn… 1
3 Entity 1 USD Partners' Capital Total Partn… 1
4 Entity 1 USD Partners' Capital Total Partn… 1
5 Entity 1 USD Partners' Capital Total Partn… 1
6 Entity 1 USD Partners' Capital Total Partn… 1
7 Entity 1 USD Partners' Capital Total Partn… 1
8 Entity 1 USD Partners' Capital Total Partn… 1
9 Entity 1 USD Partners' Capital Total Partn… 1
10 Entity 1 USD Partners' Capital Total Partn… 1
# ℹ 16 more rows
# ℹ 2 more variables: value <dbl>, date <chr>
# ℹ Use `print(n = ...)` to see more rows
If you would like to load data.table
and use rbindlist
only, you can use rbindlist
like below (should use it without by
)
copy(DT)[
,
id := .I
][
,
fund_metrics := NULL
][DT[
,
rbindlist(fund_metrics, idcol = "id")
], on = "id"][
,
id := NULL
][]
which gives
display_name reporting_currency metric_category labeled_as
1: Entity 1 USD Partners' Capital Total Partners' Capital
2: Entity 1 USD Partners' Capital Total Partners' Capital
3: Entity 1 USD Partners' Capital Total Partners' Capital
4: Entity 1 USD Partners' Capital Total Partners' Capital
5: Entity 1 USD Partners' Capital Total Partners' Capital
6: Entity 1 USD Partners' Capital Total Partners' Capital
7: Entity 1 USD Partners' Capital Total Partners' Capital
8: Entity 1 USD Partners' Capital Total Partners' Capital
9: Entity 1 USD Partners' Capital Total Partners' Capital
10: Entity 1 USD Partners' Capital Total Partners' Capital
11: Entity 1 USD Partners' Capital Total Partners' Capital
12: Entity 1 USD Partners' Capital Total Partners' Capital
13: Entity 1 USD Partners' Capital Total Partners' Capital
14: Entity 1 USD Partners' Capital Total Partners' Capital
15: Entity 1 USD Partners' Capital Total Partners' Capital
16: Entity 2 USD Partners' Capital Total Partners' Capital
17: Entity 2 USD Partners' Capital Total Partners' Capital
18: Entity 2 USD Partners' Capital Total Partners' Capital
19: Entity 2 USD Partners' Capital Total Partners' Capital
20: Entity 2 USD Partners' Capital Total Partners' Capital
21: Entity 2 USD Partners' Capital Total Partners' Capital
22: Entity 2 USD Partners' Capital Total Partners' Capital
23: Entity 2 USD Partners' Capital Total Partners' Capital
24: Entity 2 USD Partners' Capital Total Partners' Capital
25: Entity 2 USD Partners' Capital Total Partners' Capital
26: Entity 2 USD Partners' Capital Total Partners' Capital
display_name reporting_currency metric_category labeled_as
reporting_sign value date
1: 1 589933611 2020-03-31
2: 1 5168 2019-12-31
3: 1 49489 2019-09-30
4: 1 49686 2019-06-30
5: 1 59470 2020-06-30
6: 1 72353 2020-09-30
7: 1 232288 2020-12-31
8: 1 28767 2021-03-31
9: 1 1190516 2022-06-30
10: 1 17154 2022-03-31
11: 1 372091 2021-12-31
12: 1 30719 2021-09-30
13: 1 3472 2021-06-30
14: 1 12634 2022-09-30
15: 1 9528 2022-12-31
16: 1 1130 2020-06-30
17: 1 173202 2020-09-30
18: 1 53830 2020-12-31
19: 1 66257 2019-12-31
20: 1 15 2019-06-30
21: 1 67968 2019-09-30
22: 1 6639 2021-03-31
23: 1 1097127 2020-03-31
24: 1 9499 2021-12-31
25: 1 5211 2022-03-31
26: 1 40217 2022-06-30
reporting_sign value date
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论