按照标识重新排列,并保持其他变量不变。

huangapple go评论89阅读模式
英文:

Reshape by id and keep other variables fixed

问题

我试图做的是重新排列数据,其中数据在列中,但其余变量保持不变。

我需要的结果是这样的:

ID Location Gender TypeofHome MS.2007 MS.2008 MS.2009 MS.2010 MWC.2007 MWC.2008 MWC.2009 MWC.2010
1 EEUU M House 1500 3900 0 NA 400 0 100 NA
2 Germany F Department 0 4000 700 0 1000 500 0 50

哪一个是更好的解决方案?谢谢!

英文:

What I'm trying to do is reorder the data where the data is in columns, but the rest of the variables are maintained

       c1<- c("ID","Location", "Year","Gender", "MoneySpent", "MoneyWithCreditCard")
         c2<- c(1,"EEUU",2007,"M",1500,400)
         c3<- c(1,"EEUU",2008,"M",3900,0)
         c4<- c(1,"EEUU",2009,"M",0,100)
         c5<- c(2,"Germany",2007,"F",0,1000)
         c6<- c(2,"Germany",2008,"F",4000,500)
         c7<- c(2,"Germany",2009,"F",700,0)
         c8<- c(2,"Germany",2010,"F",0,50)
         Df<-data.frame(rbind(c2,c3,c4,c5,c6,c7,c8))
         colnames(Df)<-c1   

#   ID Location Year Gender MoneySpent   MoneyWithCreditCard   TypeofHome
#c2  1     EEUU 2007      M       1500        400                House
#c3  1     EEUU 2008      M       3900         0                 House
#c4  1     EEUU 2009      M          0        100                House
#c5  2  Germany 2007      F          0        1000               Department
#c6  2  Germany 2008      F       4000        500                Department
#c7  2  Germany 2009      F        700         0                 Department
#c8  2  Germany 2010      F          0         50                Department

The result I need is this one:

# ID Location Gender TypeofHome MS.2007 MS.2008 MS.2009 MS.2010 MWC.2007 MWC.2008 MWC.2009 MWC.2010
# 1   EEUU      M      House     1500    3900      0      NA      400       0        100     NA
# 2   Germany   F      Department 0      4000     700      0     1000      500       0       50

Which one is the better solution? Thanks btw!

答案1

得分: 4

这将重新命名原始数据中的列,因此只需要进行一次数据透视:

library(dplyr)
library(tidyr)

Df %>%
  rename(MS = MoneySpent, MWC = CreditCard) %>%
  pivot_wider(names_from = c("Year"),
              values_from = c("MS", "MWC"))
# # A tibble: 2 x 11
#   ID    Location Gender MS_2007 MS_2008 MS_2009 MS_2010 MWC_2007 MWC_2008 MWC_2009 MWC_2010
#   <fct> <fct>    <fct>  <fct>   <fct>   <fct>   <fct>   <fct>    <fct>    <fct>    <fct>
# 1 1     EEUU     M      1500    3900    0       NA      400      0        100      NA      
# 2 2     Germany  F      0       4000    700     0       1000     500      0        50 

请注意,我已经将代码部分保留为原文,只翻译了代码之外的文本。

英文:

This renames the columns in the original data so only a single pivot is needed:

library(dplyr)
library(tidyr)

Df %&gt;%
  rename(MS = MoneySpent, MWC = CreditCard) %&gt;%
  pivot_wider(names_from = c(&quot;Year&quot;),
              values_from = c(&quot;MS&quot;, &quot;MWC&quot;))
# # A tibble: 2 x 11
#   ID    Location Gender MS_2007 MS_2008 MS_2009 MS_2010 MWC_2007 MWC_2008 MWC_2009 MWC_2010
#   &lt;fct&gt; &lt;fct&gt;    &lt;fct&gt;  &lt;fct&gt;   &lt;fct&gt;   &lt;fct&gt;   &lt;fct&gt;   &lt;fct&gt;    &lt;fct&gt;    &lt;fct&gt;    &lt;fct&gt;   
# 1 1     EEUU     M      1500    3900    0       NA      400      0        100      NA      
# 2 2     Germany  F      0       4000    700     0       1000     500      0        50 

答案2

得分: 3

这是使用 pivot_longerpivot_wider 的选项。我们首先使用 pivot_longer 在 'Money' 列上将数据变成'长'格式,根据列名添加 'MC' 或 'MWC' 来更改 'Year',然后使用 pivot_wider 将数据变成'宽'格式。

library(dplyr)
library(tidyr)
library(stringr)
Df %>%
    pivot_longer(cols = starts_with("Money")) %>%
    mutate(Year = case_when(name == "MoneySpent" ~ str_c("MS.", Year),
                            TRUE ~ str_c("MWC.", Year))) %>%
    select(-name) %>%
    pivot_wider(names_from = Year, values_from = value)
# ID Location Gender TypeofHome MS.2007 MWC.2007 MS.2008 MWC.2008 MS.2009 MWC.2009 MS.2010 MWC.2010
# 1  1     EEUU      M      House    1500      400    3900        0       0      100      NA       NA
# 2  2  Germany      F Department       0     1000    4000      500     700        0       0       50

或者使用 rename_at

Df %>%
   rename_at(vars(matches("Money")), ~ str_remove_all(., "[a-z]+")) %>%
   pivot_wider(names_from = Year, values_from = starts_with("M"))
# ID Location Gender MS_2007 MS_2008 MS_2009 MS_2010 MWCC_2007 MWCC_2008 MWCC_2009 MWCC_2010
# 1  1     EEUU      M    1500    3900       0     NA        400         0       100      NA
# 2  2  Germany      F       0    4000     700      0       1000       500         0       50
英文:

Here is an option with pivot_longer and pivot_wider. We first reshape to 'long' format with pivot_longer on the 'Money' columns, change the 'Year' by appending 'MC' or 'MWC' based on the column names, and do a pivot_wider to 'wide' format

library(dplyr)
library(tidyr)
library(stringr)
Df %&gt;% 
    pivot_longer(cols = starts_with(&quot;Money&quot;)) %&gt;% 
    mutate(Year = case_when(name == &quot;MoneySpent&quot; ~ str_c(&quot;MS.&quot;, Year),
                            TRUE ~ str_c(&quot;MWC.&quot;, Year))) %&gt;% 
    select(-name) %&gt;%
    pivot_wider(names_from = Year, values_from = value)
#ID Location Gender TypeofHome MS.2007 MWC.2007 MS.2008 MWC.2008 MS.2009 MWC.2009 MS.2010 MWC.2010
#1  1     EEUU      M      House    1500      400    3900        0       0      100      NA       NA
#2  2  Germany      F Department       0     1000    4000      500     700        0       0       50

Or using rename_at

Df %&gt;% 
   rename_at(vars(matches(&quot;Money&quot;)), ~ str_remove_all(., &quot;[a-z]+&quot;)) %&gt;% 
   pivot_wider(names_from = Year, values_from = starts_with(&quot;M&quot;))
# ID Location Gender MS_2007 MS_2008 MS_2009 MS_2010 MWCC_2007 MWCC_2008 MWCC_2009 MWCC_2010
#1  1     EEUU      M    1500    3900       0    &lt;NA&gt;       400         0       100      &lt;NA&gt;
#2  2  Germany      F       0    4000     700       0      1000       500         0        50

答案3

得分: 1

你可以尝试使用基本的R函数reshape

reshape(setNames(Df, c(names(Df)[1:4], "MS", "MWS", "TypeOfHome")), 
        idvar=c("ID", "Location", "Gender", "TypeOfHome"),
        timevar="Year", direction="wide")
#   ID Location Gender TypeOfHome MS.2007 MWS.2007 MS.2008 MWS.2008 MS.2009 MWS.2009 MS.2010 MWS.2010
# 1  1     EEUU      M      House    1500      400    3900        0       0      100    <NA>     <NA>
# 4  2  Germany      F Appartment       0     1000    4000      500     700        0       0       50
英文:

You may want to try reshape from base R.

reshape(setNames(Df, c(names(Df)[1:4], &quot;MS&quot;, &quot;MWS&quot;, &quot;TypeOfHome&quot;)), 
        idvar=c(&quot;ID&quot;, &quot;Location&quot;, &quot;Gender&quot;, &quot;TypeOfHome&quot;),
        timevar=&quot;Year&quot;, direction=&quot;wide&quot;)
#   ID Location Gender TypeOfHome MS.2007 MWS.2007 MS.2008 MWS.2008 MS.2009 MWS.2009 MS.2010 MWS.2010
# 1  1     EEUU      M      House    1500      400    3900        0       0      100    &lt;NA&gt;     &lt;NA&gt;
# 4  2  Germany      F Appartment       0     1000    4000      500     700        0       0       50

huangapple
  • 本文由 发表于 2020年1月7日 02:27:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/59617107.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定