将字符串列通用分割成多个列,使用 data.table

huangapple go评论82阅读模式
英文:

Generalised splitting string column into multiple columns with data.table

问题

这个回答可以用于将字符串列拆分为多个列的问题吗?

对于任何用户提供的列而不仅仅是名为 "type" 的列,有没有一种通用的方法?

我尝试了在列名称上进行循环,例如:

dtToSplit = data.table(attr = c(1,30,4,6),
                       typeA=c('foo_and_bar','foo_and_bar_2'),
                       typeB=c('cat_and_dog', 'orange_and_apple'))
namesSpl <- c('typeA', 'typeB')
for (indN in namesSpl) {
  dtToSplit[, paste0(indN, 1:2) := tstrsplit(.(indN), "_and_")]
}

但是我没有得到字符串拆分的结果:

   attr         typeA            typeB typeA1 typeA2 typeB1 typeB2
1:    1   foo_and_bar      cat_and_dog  typeA  typeA  typeB  typeB
2:   30 foo_and_bar_2 orange_and_apple  typeA  typeA  typeB  typeB
3:    4   foo_and_bar      cat_and_dog  typeA  typeA  typeB  typeB
4:    6 foo_and_bar_2 orange_and_apple  typeA  typeA  typeB  typeB

也许循环不是最佳的方法?

英文:

Is there a way to generalise this (https://stackoverflow.com/a/33127773/22295881) answer to the problem of splitting a string column into multiple columns with data.table?

It would be great to have a solution that can work for any user-provided column rather than the column named "type".

I tried to loop on column names e.g.:

dtToSplit = data.table(attr = c(1,30,4,6),
                       typeA=c(&#39;foo_and_bar&#39;,&#39;foo_and_bar_2&#39;),
                       typeB=c(&#39;cat_and_dog&#39;, &#39;orange_and_apple&#39;))
namesSpl &lt;- c(&#39;typeA&#39;, &#39;typeB&#39;)
for (indN in namesSpl) {
  dtToSplit[, paste0(indN, 1:2) := tstrsplit(.(indN), &quot;_and_&quot;)]
}

Instead of splitting the strings I get:

   attr         typeA            typeB typeA1 typeA2 typeB1 typeB2
1:    1   foo_and_bar      cat_and_dog  typeA  typeA  typeB  typeB
2:   30 foo_and_bar_2 orange_and_apple  typeA  typeA  typeB  typeB
3:    4   foo_and_bar      cat_and_dog  typeA  typeA  typeB  typeB
4:    6 foo_and_bar_2 orange_and_apple  typeA  typeA  typeB  typeB

Maybe a loop is not the best idea?

答案1

得分: 0

使用get(indN)而不是.(indN)

for (indN in namesSpl) {
  dtToSplit[, paste0(indN, 1:2) := tstrsplit(get(indN), "_and_")]
}
dtToSplit
#     attr         typeA            typeB typeA1 typeA2 typeB1 typeB2
#    <num>        <char>           <char> <char> <char> <char> <char>
# 1:     1   foo_and_bar      cat_and_dog    foo    bar    cat    dog
# 2:    30 foo_and_bar_2 orange_and_apple    foo  bar_2 orange  apple
# 3:     4   foo_and_bar      cat_and_dog    foo    bar    cat    dog
# 4:     6 foo_and_bar_2 orange_and_apple    foo  bar_2 orange  apple
英文:

Use get(indN) instead of .(indN):

for (indN in namesSpl) {
  dtToSplit[, paste0(indN, 1:2) := tstrsplit(get(indN), &quot;_and_&quot;)]
}
dtToSplit
#     attr         typeA            typeB typeA1 typeA2 typeB1 typeB2
#    &lt;num&gt;        &lt;char&gt;           &lt;char&gt; &lt;char&gt; &lt;char&gt; &lt;char&gt; &lt;char&gt;
# 1:     1   foo_and_bar      cat_and_dog    foo    bar    cat    dog
# 2:    30 foo_and_bar_2 orange_and_apple    foo  bar_2 orange  apple
# 3:     4   foo_and_bar      cat_and_dog    foo    bar    cat    dog
# 4:     6 foo_and_bar_2 orange_and_apple    foo  bar_2 orange  apple

huangapple
  • 本文由 发表于 2023年7月28日 00:43:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76781849.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定