你可以使用R来将数据框转置,使某一列成为列名,而另一列填充值。

huangapple go评论81阅读模式
英文:

How can I transpose a data frame in R so that a certain column becomes column names and another column fills the values?

问题

我有一个数据集,我想将某列的条目转置为列名,然后用原始数据框中相应的条目填充这些列。

我的数据框如下:

Trip Vessel Species Weight
1 A BET 54
2 B YFT 65
1 D SKJ 12
3 B YFT 56
1 C ALB 98
4 B YFT 35
3 D BET 78
2 C ALB 76
1 A BET 23
2 B SKJ 76

我想要转置它,使得物种成为列名,每个船和旅行的物种的重量。当物种在同一船只和旅行中出现时,还要将值相加。

Trip Vessel BET YFT SKJ ALB
1 A 77 0 0 0
2 B 0 65 76 0
1 D 0 0 12 0
3 B 0 56 0 0
1 C 0 0 0 98
4 B 0 35 0 0
3 D 78 0 0 0
2 C 0 0 0 76

我尝试过一些在网上找到的tidyversetidyrdplyr选项,但所有的结果都告诉我这些包是建立在R的旧版本上的,因此不起作用。我还使用了tidyverse下的spread函数,但它没有按我想要的方式添加值。

我正在使用的R版本是:
RStudio 2023.03.1+446 "Cherry Blossom"
Release (6e31ffc3ef2a1f81d377eeccab71ddc11cfbd29e, 2023-05-09) for windows
Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko)
RStudio/2023.03.1+446
Chrome/108.0.5359.179
Electron/22.0.3
Safari/537.36

谢谢。

英文:

I have a data set whereby I want to transpose the entries in a certain column as column names and then populate those columns with the corresponding entreis from the original dataframe.

My data frame looks as follows:

Trip Vessel Species Weight
1 A BET 54
2 B YFT 65
1 D SKJ 12
3 B YFT 56
1 C ALB 98
4 B YFT 35
3 D BET 78
2 C ALB 76
1 A BET 23
2 B SKJ 76

I want to transpose it so that I have the species as column names and the weight of each species per vessel and trip. And it also adds the values per species when it occurs across the same vessel and trip.

Trip Vessel BET YFT SKJ ALB
1 A 77 0 0 0
2 B 0 65 76 0
1 D 0 0 12 0
3 B 0 56 0 0
1 C 0 0 0 98
4 B 0 35 0 0
3 D 78 0 0 0
2 C 0 0 0 76

I have tried a couple of tidyverse, tidyr, dplyr options that I found online, but all results are telling me that these packages are built under previous versions of R and therfore not working. I've also used the spread function under tidyverse but it does not add the values like I want it to.

The version of R I am using is:
RStudio 2023.03.1+446 "Cherry Blossom"
Release (6e31ffc3ef2a1f81d377eeccab71ddc11cfbd29e, 2023-05-09) for windows
Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko)
RStudio/2023.03.1+446
Chrome/108.0.5359.179
Electron/22.0.3
Safari/537.36

Thanks,

答案1

得分: 2

你需要使用 values_fn = sumvalues_fill = 0 来对应地对数值进行求和,并将缺失的条目设置为零。

此外,请注意 spread 函数已经被 tidyr 包中的 pivot_wider 取代。

英文:

You'll need to use values_fn = sum and values_fill = 0 to sum the corresponding values and set missing entries to zero.

Also note the spread function was superseded by pivot_wider in the tidyr package.

library(tidyr)

pivot_wider(df, names_from = Species, values_from = Weight, values_fn = sum, values_fill = 0)

# A tibble: 8 × 6
   Trip Vessel   BET   YFT   SKJ   ALB
  <int> <chr>  <int> <int> <int> <int>
1     1 A         77     0     0     0
2     2 B          0    65    76     0
3     1 D          0     0    12     0
4     3 B          0    56     0     0
5     1 C          0     0     0    98
6     4 B          0    35     0     0
7     3 D         78     0     0     0
8     2 C          0     0     0    76

答案2

得分: 1

也许你应该保持你的R版本和相关包,比如 dplyrtidyrtidyverse保持最新。否则,如果你不想依赖额外的包,可以尝试使用基础R中的 reshape

dfout <- reshape(
    aggregate(Weight ~ ., df, sum),
    direction = "wide",
    idvar = c("Trip", "Vessel"),
    timevar = "Species"
)

dfout[is.na(dfout)] <- 0

这将产生以下结果:

> dfout
  Trip Vessel Weight.ALB Weight.BET Weight.SKJ Weight.YFT
1    1      C         98          0          0          0
2    2      C         76          0          0          0
3    1      A          0         77          0          0
4    3      D          0         78          0          0
5    2      B          0          0         76         65
6    1      D          0          0         12          0
8    3      B          0          0          0         56
9    4      B          0          0          0         35
英文:

Maybe you should keep your both the R version and the related packages, e.g., dplyr, tidyr and tidyverse, update to date. Otherwise, you can try reshape from base R if you don't want any dependency to additional packages.

dfout &lt;- reshape(
    aggregate(Weight ~ ., df, sum),
    direction = &quot;wide&quot;,
    idvar = c(&quot;Trip&quot;, &quot;Vessel&quot;),
    timevar = &quot;Species&quot;
)


dfout[is.na(dfout)] &lt;- 0

which gives

&gt; dfout
  Trip Vessel Weight.ALB Weight.BET Weight.SKJ Weight.YFT
1    1      C         98          0          0          0
2    2      C         76          0          0          0
3    1      A          0         77          0          0
4    3      D          0         78          0          0
5    2      B          0          0         76         65
6    1      D          0          0         12          0
8    3      B          0          0          0         56
9    4      B          0          0          0         35

huangapple
  • 本文由 发表于 2023年5月22日 19:27:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76305682.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定