英文:
Complex Long-Wide Dataset to Long Dataset in R
问题
我有一个看起来像这样的复杂数据集:
df1 <- tibble::tribble(~"Canada > London", ~"", ~"Notes", ~"United Kingdom > London", ~"", ~"",
"Restaurant", "Price", "Range", "Restaurant", "Price", "Range",
"Fried beef", "27", "25-30", "Fried beef", "29", "25 - 35",
"Fried potato", "5", "3 - 8", "Fried potato", "8", "3 - 8",
"Bar", "Price", "Range", "Price", "Range", "",
"Beer Lager", "5", "4 - 8", "Beer Lager", "6", "4 - 8",
"Beer Dark", "4", "3 - 7", "Beer Dark", "5", "3 - 7")
或者,以可视化形式:
它在参数方面很长(如"Beer Lager"、"Beer Dark"等),并且在数据输入方面很宽(像"Canada > London"或"United Kingdom > London"这样的多个宽元素)。
期望的输出应该是两个数据集,看起来像这样:
- 第一个数据集(数值):
- 第二个数据集(范围):
任何建议将不胜感激
英文:
I have a complex dataset that looks like this:
df1 <- tibble::tribble(~"Canada > London", ~"", ~"Notes", ~"United Kingdom > London", ~"", ~"",
"Restaurant", "Price", "Range", "Restaurant", "Price", "Range",
"Fried beef", "27", "25-30", "Fried beef", "29", "25 - 35",
"Fried potato", "5", "3 - 8", "Fried potato", "8", "3 - 8",
"Bar", "Price", "Range", "Price", "Range", "",
"Beer Lager", "5", "4 - 8", "Beer Lager", "6", "4 - 8",
"Beer Dark", "4", "3 - 7", "Beer Dark", "5", "3 - 7")
Or, for visual representation:
It is long in parameters (like Beer Lager, Beer Dark, ....) and wide by the data input (many wide elements like Canada > London, or United Kingdom > London).
The desired output would be two datasets that should look like this:
- The first dataset (the Values):
- The second dataset (the Ranges):
Any suggestions would be much appreciated
答案1
得分: 2
你的数据既不宽也不长,而是一个杂乱的数据表,需要进行一些清理才能将其转换为整洁的数据。之后,你可以使用 tidyr::pivot_wider
来获取你想要的表格:
library(dplyr)
library(tidyr)
library(purrr)
tidy_data <- function(.data, cols) {
.data <- .data[cols]
place <- names(.data)[[1]]
.data %>%
rename(product = 1, price = 2, range = 3) %>%
filter(!price %in% c("Price", "Range")) %>%
mutate(place = place)
}
df1_tidy <- purrr::map_dfr(list(1:3, 4:6), tidy_data, .data = df1)
df1_tidy %>%
select(place, product, price) %>%
pivot_wider(names_from = product, values_from = price)
#> # A tibble: 2 × 5
#> place `Fried beef` `Fried potato` `Beer Lager` `Beer Dark`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Canada > London 27 5 5 4
#> 2 United Kingdom > London 29 8 6 5
df1_tidy %>%
select(place, product, range) %>%
pivot_wider(names_from = product, values_from = range, names_glue = "{product} Range")
#> # A tibble: 2 × 5
#> place `Fried beef Range` Fried potato Range `Beer Lager Range` `Beer Dark Range`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Canada > London 25-30 3 - 8 4 - 8 3 - 7
#> 2 United Kingdom > London 25 - 35 3 - 8 4 - 8 3 - 7
请注意,代码部分未被翻译。
英文:
Your data is neither wide nor long but is a messy data table which needs some cleaning to convert it to tidy data. Afterwards you could get your desired tables using tidyr::pivot_wider
:
library(dplyr)
library(tidyr)
library(purrr)
tidy_data <- function(.data, cols) {
.data <- .data[cols]
place <- names(.data)[[1]]
.data |>
rename(product = 1, price = 2, range = 3) |>
filter(!price %in% c("Price", "Range")) |>
mutate(place = place)
}
df1_tidy <- purrr::map_dfr(list(1:3, 4:6), tidy_data, .data = df1)
df1_tidy |>
select(place, product, price) |>
pivot_wider(names_from = product, values_from = price)
#> # A tibble: 2 × 5
#> place `Fried beef` `Fried potato` `Beer Lager` `Beer Dark`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Canada > London 27 5 5 4
#> 2 United Kingdom > London 29 8 6 5
df1_tidy |>
select(place, product, range) |>
pivot_wider(names_from = product, values_from = range, names_glue = "{product} Range")
#> # A tibble: 2 × 5
#> place `Fried beef Range` Fried potato Rang…¹ Beer …² Beer …³
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Canada > London 25-30 3 - 8 4 - 8 3 - 7
#> 2 United Kingdom > London 25 - 35 3 - 8 4 - 8 3 - 7
#> # … with abbreviated variable names ¹`Fried potato Range`, ²`Beer Lager Range`,
#> # ³`Beer Dark Range`
答案2
得分: 1
我同意@stefan。实际上,你有4个表格,或者看待方式不同,也可以视为2个表格。下面是两个函数的实现,用于开始清洁和格式化过程。第一个函数按行拆分数据框,第二个函数按列拆分数据框。之后更容易进行格式化、清理和合并这些数据框为一个。
英文:
I agree with @stefan. You actually have 4 tables, or 2 depending on how you look at it. Here is an implementation of 2 functions that start the cleaning and formatting process. The first split the dfs by row and the second function splits them by column. After that it is easier to format, clean, and merge the dfs into 1.
library(tidyverse)
df0 = tibble::tribble(~"Canada > London", ~"", ~"Notes", ~"United Kingdom > London", ~"", ~"",
"Restaurant", "Price", "Range", "Restaurant", "Price", "Range",
"Fried beef", "27", "25-30", "Fried beef", "29", "25 - 35",
"Fried potato", "5", "3 - 8", "Fried potato", "8", "3 - 8",
"Bar", "Price", "Range", "Price", "Range", "",
"Beer Lager", "5", "4 - 8", "Beer Lager", "6", "4 - 8",
"Beer Dark", "4", "3 - 7", "Beer Dark", "5", "3 - 7")
split_rows = function(df){
# breaks of sub-dfs within original df
df_breaks = df[,2] == "Price"
df_breaks = (1:length(df_breaks))[df_breaks]
df_breaks
# list to populate in loop with sub-dfs
df_list = c()
for(i in 1:length(df_breaks)){
# get start of sub-df
start = df_breaks[i]
# get end of sub-df
if(i == length(df_breaks)){
end = nrow(df) # if its the last set it to the last row of the original df
}
else{
end = df_breaks[i+1]-1 # else, set it to the next start - 1
}
# subset df
df_temp = df[start:end,]
# first row as header
colnames(df_temp) = df_temp[1,]
df_temp = df_temp[-1,]
# append to df_list
df_list = append(df_list,list(df_temp))
}
return(df_list)
}
split_cols = function(df_list,second_df_col_start = 4){
df_list = lapply(df_list, function(df){
df1 = df[,1:(second_df_col_start-1)]
df2 = df[,second_df_col_start:ncol(df)]
return(list(df1,df2))
})
return(df_list)
}
output = split_rows(df0) %>%
split_cols()
output:
[[1]]
[[1]][[1]]
# A tibble: 2 × 3
Restaurant Price Range
<chr> <chr> <chr>
1 Fried beef 27 25-30
2 Fried potato 5 3 - 8
[[1]][[2]]
# A tibble: 2 × 3
Restaurant Price Range
<chr> <chr> <chr>
1 Fried beef 29 25 - 35
2 Fried potato 8 3 - 8
[[2]]
[[2]][[1]]
# A tibble: 2 × 3
Bar Price Range
<chr> <chr> <chr>
1 Beer Lager 5 4 - 8
2 Beer Dark 4 3 - 7
[[2]][[2]]
# A tibble: 2 × 3
Price Range ``
<chr> <chr> <chr>
1 Beer Lager 6 4 - 8
2 Beer Dark 5 3 - 7
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论