将数据从一个列复制到另一个列,如果另一个列中的条件满足。

huangapple go评论47阅读模式
英文:

Copy data from one column to another column if condition in another column is met

问题

我试图创建一个包含列标题 study_id、Conflict 和第三列以及第四列的表,这两列包含与列 Conflict 中的值对应的数据。

下面的代码实现了这个目标。然而,这段代码相对较长,特别是因为我希望将其扩展到覆盖列 conflict 中的数百个不同值。

谢谢您提前的任何指导。

英文:

I am trying to create a table with the headings study_id, Conflict and create a third and forth column which contains data that corresponds to the value in column: Conflict

The code below achieves this aim. However it is rather long especially as I wish to expand this to cover several hundred different values in column: conflict

Thank you in advance for any pointers

df <- data.frame(study_id=c("1", "1", "4", "4", "5"), 
                 Conflict=c("WATER.START", "WATER.STOP", "OIL.START", "NA", "WATER.STOP"), 
                 Result=c("TRUE", "TRUE", "TRUE", "NA", "TRUE"))

df2 <- data.frame(study_id=c("1", "2", "3", "4", "5"), 
                  WATER.start=c(1, 1, 2, NA, 6), 
                  WATER.truestart=c(1, 1, 2, NA, 25), 
                  WATER.stop=c(33, 3, 2, NA, 8), 
                  WATER.truestop= c(34, 4, 2, NA, 8))

final <- left_join(df, df2, by ='study_id')

dd <- final %>% filter(Result == "TRUE" & Conflict == "WATER.START")
dd <- dd %>% subset(., Conflict == "WATER.START", 
                    select=c(study_id, Conflict, WATER.start, WATER.truestart))
dd <- dd %>% rename(initial=WATER.start) %>% rename(verification=WATER.truestart)

ee <- final %>% filter(Result == "TRUE" & Conflict == "WATER.STOP")
ee <- ee %>% subset(., Conflict == "WATER.STOP", 
                    select=c(study_id, Conflict, WATER.stop, WATER.truestop))
ee <- ee %>% rename(initial=WATER.stop) %>% rename(verification=WATER.truestop)

ff <- bind_rows(dd, ee)
gg <- ff %>% select(study_id, Conflict, initial, verification)

gg 
#   study_id    Conflict initial verification
# 1        1 WATER.START       1            1
# 2        1  WATER.STOP      33           34
# 3        5  WATER.STOP       8            8

答案1

得分: 2

以下是您要翻译的内容:

First, as it seems you only focus on c('WATER.START', 'WATER.STOP'), subset your first df. Next, for every MARGIN=1 (i.e. every row), we apply an anonymous function \(x) that selects the appropriate columns of df2, using tolower to match case, and cbinds together. Finally rename columns using setNames and rbind the resulting list.

首先,似乎您只关注c('WATER.START', 'WATER.STOP'),对第一个df进行subset。接下来,对于每个MARGIN=1(即每一行),我们使用匿名函数\(x)选择df2的适当列,使用tolower进行大小写匹配,并使用cbind将它们组合在一起。最后使用setNames重命名列并使用rbind合并结果列表。

subset(df, Conflict %in% c('WATER.START', 'WATER.STOP')) |>
apply(MARGIN=1, (x) {
mt <- match(tolower(x[2]), tolower(names(df2)))
cbind(t(x[1:2]), df2[df2$study_id == x[1], c(mt, mt + 1)]) |>
setNames(c('study_id', 'Conflict', 'initial', 'verification'))
}) |> do.call(what=rbind)

study_id Conflict initial verification

1 1 WATER.START 1 1

2 1 WATER.STOP 33 34

5 5 WATER.STOP 8 8

You could also use a dictionary a (which may be expanded to other levels you might be using).

您还可以使用字典a(可以扩展到您可能使用的其他级别)。

a <- c(WATER.START='WATER.start', WATER.STOP='WATER.stop')

subset(df, Conflict %in% c('WATER.START', 'WATER.STOP')) |>
apply(MARGIN=1, (x) {
mt <- match(a[match(x[2], names(a))], names(df2))
cbind(t(x[1:2]), df2[df2$study_id == x[1], c(mt, mt + 1)]) |>
setNames(c('study_id', 'Conflict', 'initial', 'verification'))
}) |> do.call(what=rbind)

study_id Conflict initial verification

1 1 WATER.START 1 1

2 1 WATER.STOP 33 34

5 5 WATER.STOP 8 8

However, I think what you really need is to reshape your data.

然而,我认为您真正需要的是对数据进行reshape

basic

reshape(df2, direction='long', idvar=1, varying=list(c(2, 4), c(3, 5)))

study_id time WATER.start WATER.truestart

1.1 1 1 1 1

2.1 2 1 1 1

3.1 3 1 2 2

4.1 4 1 NA NA

5.1 5 1 6 25

1.2 1 2 33 34

2.2 2 2 3 4

3.2 3 2 2 2

4.2 4 2 NA NA

5.2 5 2 8 8

enhanced

reshape(df2, dir='long', idvar='study_id',
varying=list(c("WATER.start", "WATER.stop"), c("WATER.truestart", "WATER.truestop")),
timevar='foo', times=c('water.start', 'water.stop'), v.names=c('initial', 'verification'))

study_id foo initial verification

1.water.start 1 water.start 1 1

2.water.start 2 water.start 1 1

3.water.start 3 water.start 2 2

4.water.start 4 water.start NA NA

5.water.start 5 water.start 6 25

1.water.stop 1 water.stop 33 34

2.water.stop 2 water.stop 3 4

3.water.stop 3 water.stop 2 2

4.water.stop 4 water.stop NA NA

5.water.stop 5 water.stop 8 8

英文:

First, as it seems you only focus on c(&#39;WATER.START&#39;, &#39;WATER.STOP&#39;), subset your first df. Next, for every MARGIN=1 (i.e. every row), we apply an anonymous function \(x) that selects the appropriate columns of df2, using tolower to match case, and cbinds together. Finally rename columns using setNames and rbind the resulting list.

subset(df, Conflict %in% c(&#39;WATER.START&#39;, &#39;WATER.STOP&#39;)) |&gt;
  apply(MARGIN=1, \(x) {
    mt &lt;- match(tolower(x[2]), tolower(names(df2)))
    cbind(t(x[1:2]), df2[df2$study_id == x[1], c(mt, mt + 1)]) |&gt;
    setNames(c(&#39;study_id&#39;, &#39;Conflict&#39;, &#39;initial&#39;, &#39;verification&#39;))
}) |&gt; do.call(what=rbind)
#   study_id    Conflict initial verification
# 1        1 WATER.START       1            1
# 2        1  WATER.STOP      33           34
# 5        5  WATER.STOP       8            8

You could also use a dictionary a (which may be expanded to other levels you might be using).

a &lt;- c(WATER.START=&#39;WATER.start&#39;, WATER.STOP=&#39;WATER.stop&#39;)

subset(df, Conflict %in% c(&#39;WATER.START&#39;, &#39;WATER.STOP&#39;)) |&gt;
  apply(MARGIN=1, \(x) {
    mt &lt;- match(a[match(x[2], names(a))], names(df2))
    cbind(t(x[1:2]), df2[df2$study_id == x[1], c(mt, mt + 1)]) |&gt;
      setNames(c(&#39;study_id&#39;, &#39;Conflict&#39;, &#39;initial&#39;, &#39;verification&#39;))
  }) |&gt; do.call(what=rbind)
#   study_id    Conflict initial verification
# 1        1 WATER.START       1            1
# 2        1  WATER.STOP      33           34
# 5        5  WATER.STOP       8            8

However, I think what you really need is to reshape your data.

## basic
reshape(df2, direction=&#39;long&#39;, idvar=1, varying=list(c(2, 4), c(3, 5))) 
#     study_id time WATER.start WATER.truestart
# 1.1        1    1           1               1
# 2.1        2    1           1               1
# 3.1        3    1           2               2
# 4.1        4    1          NA              NA
# 5.1        5    1           6              25
# 1.2        1    2          33              34
# 2.2        2    2           3               4
# 3.2        3    2           2               2
# 4.2        4    2          NA              NA
# 5.2        5    2           8               8

## enhanced
reshape(df2, dir=&#39;long&#39;, idvar=&#39;study_id&#39;,
        varying=list(c(&quot;WATER.start&quot;, &quot;WATER.stop&quot;), c(&quot;WATER.truestart&quot;, &quot;WATER.truestop&quot;)),
        timevar=&#39;foo&#39;, times=c(&#39;water.start&#39;, &#39;water.stop&#39;), v.names=c(&#39;initial&#39;, &#39;verification&#39;)) 
#               study_id         foo initial verification
# 1.water.start        1 water.start       1            1
# 2.water.start        2 water.start       1            1
# 3.water.start        3 water.start       2            2
# 4.water.start        4 water.start      NA           NA
# 5.water.start        5 water.start       6           25
# 1.water.stop         1  water.stop      33           34
# 2.water.stop         2  water.stop       3            4
# 3.water.stop         3  water.stop       2            2
# 4.water.stop         4  water.stop      NA           NA
# 5.water.stop         5  water.stop       8            8

huangapple
  • 本文由 发表于 2023年2月18日 21:18:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75493596.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定