将字符串列通用分割成多个列,使用 data.table

huangapple go评论105阅读模式
英文:

Generalised splitting string column into multiple columns with data.table

问题

这个回答可以用于将字符串列拆分为多个列的问题吗?

对于任何用户提供的列而不仅仅是名为 "type" 的列,有没有一种通用的方法?

我尝试了在列名称上进行循环,例如:

  1. dtToSplit = data.table(attr = c(1,30,4,6),
  2. typeA=c('foo_and_bar','foo_and_bar_2'),
  3. typeB=c('cat_and_dog', 'orange_and_apple'))
  4. namesSpl <- c('typeA', 'typeB')
  5. for (indN in namesSpl) {
  6. dtToSplit[, paste0(indN, 1:2) := tstrsplit(.(indN), "_and_")]
  7. }

但是我没有得到字符串拆分的结果:

  1. attr typeA typeB typeA1 typeA2 typeB1 typeB2
  2. 1: 1 foo_and_bar cat_and_dog typeA typeA typeB typeB
  3. 2: 30 foo_and_bar_2 orange_and_apple typeA typeA typeB typeB
  4. 3: 4 foo_and_bar cat_and_dog typeA typeA typeB typeB
  5. 4: 6 foo_and_bar_2 orange_and_apple typeA typeA typeB typeB

也许循环不是最佳的方法?

英文:

Is there a way to generalise this (https://stackoverflow.com/a/33127773/22295881) answer to the problem of splitting a string column into multiple columns with data.table?

It would be great to have a solution that can work for any user-provided column rather than the column named "type".

I tried to loop on column names e.g.:

  1. dtToSplit = data.table(attr = c(1,30,4,6),
  2. typeA=c(&#39;foo_and_bar&#39;,&#39;foo_and_bar_2&#39;),
  3. typeB=c(&#39;cat_and_dog&#39;, &#39;orange_and_apple&#39;))
  4. namesSpl &lt;- c(&#39;typeA&#39;, &#39;typeB&#39;)
  5. for (indN in namesSpl) {
  6. dtToSplit[, paste0(indN, 1:2) := tstrsplit(.(indN), &quot;_and_&quot;)]
  7. }

Instead of splitting the strings I get:

  1. attr typeA typeB typeA1 typeA2 typeB1 typeB2
  2. 1: 1 foo_and_bar cat_and_dog typeA typeA typeB typeB
  3. 2: 30 foo_and_bar_2 orange_and_apple typeA typeA typeB typeB
  4. 3: 4 foo_and_bar cat_and_dog typeA typeA typeB typeB
  5. 4: 6 foo_and_bar_2 orange_and_apple typeA typeA typeB typeB

Maybe a loop is not the best idea?

答案1

得分: 0

使用get(indN)而不是.(indN)

  1. for (indN in namesSpl) {
  2. dtToSplit[, paste0(indN, 1:2) := tstrsplit(get(indN), "_and_")]
  3. }
  4. dtToSplit
  5. # attr typeA typeB typeA1 typeA2 typeB1 typeB2
  6. # <num> <char> <char> <char> <char> <char> <char>
  7. # 1: 1 foo_and_bar cat_and_dog foo bar cat dog
  8. # 2: 30 foo_and_bar_2 orange_and_apple foo bar_2 orange apple
  9. # 3: 4 foo_and_bar cat_and_dog foo bar cat dog
  10. # 4: 6 foo_and_bar_2 orange_and_apple foo bar_2 orange apple
英文:

Use get(indN) instead of .(indN):

  1. for (indN in namesSpl) {
  2. dtToSplit[, paste0(indN, 1:2) := tstrsplit(get(indN), &quot;_and_&quot;)]
  3. }
  4. dtToSplit
  5. # attr typeA typeB typeA1 typeA2 typeB1 typeB2
  6. # &lt;num&gt; &lt;char&gt; &lt;char&gt; &lt;char&gt; &lt;char&gt; &lt;char&gt; &lt;char&gt;
  7. # 1: 1 foo_and_bar cat_and_dog foo bar cat dog
  8. # 2: 30 foo_and_bar_2 orange_and_apple foo bar_2 orange apple
  9. # 3: 4 foo_and_bar cat_and_dog foo bar cat dog
  10. # 4: 6 foo_and_bar_2 orange_and_apple foo bar_2 orange apple

huangapple
  • 本文由 发表于 2023年7月28日 00:43:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76781849.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定