将Classes ‘data.table’和’data.frame’写入外部文件。

huangapple go评论76阅读模式

Write to a external file a Classes ‘data.table’ and 'data.frame'


The content you provided is in English, and you've requested a Chinese translation. Here's the translation of the provided content:


Classes ‘data.table’ and 'data.frame':  106876 obs. of  17 variables:
$ col1  : num  1 2 3 4  ...
$ col2  : chr  "Chr00c00001" "Chr00c00001" "Chr00c00001" "Chr00c00001" ...
$ col3  : num  1 2 3 4 ...
$ col4 :List of 106876
..$ : chr
..$ : chr
..$ : chr 
..$ : chr "Chr1g00005011"
.. [list output truncated]
$col4 :List of 106876
..$ : chr "Chr1g00000491"
..$ : chr
..$ : chr
..$ : chr "Chr1g00000501"
.. [list output truncated]


col1    col2      col3       col4       col5
  1  Chr00c00001   1                   Chr1g00000491
  2  Chr00c00001   2
  3  Chr00c00001   3
  4  Chr00c00001   4    Chr1g00005011  Chr1g00000501



A program that I am using in R generates an output of Classes ‘data.table’ and 'data.frame'

And looks like this:

Classes ‘data.table’ and 'data.frame':  106876 obs. of  17 variables:
$ col1  : num  1 2 3 4  ...
$ col2  : chr  "Chr00c00001" "Chr00c00001" "Chr00c00001" "Chr00c00001" ...
$ col3  : num  1 2 3 4 ...
$ col4 :List of 106876
..$ : chr
..$ : chr
..$ : chr 
..$ : chr "Chr1g00005011"
.. [list output truncated]
$col4 :List of 106876
..$ : chr "Chr1g00000491"
..$ : chr
..$ : chr
..$ : chr "Chr1g00000501"
.. [list output truncated]

I would like to write this to a table where each col is a column and the data on them are in the rows to have something like this using functions like write.table

col1    col2      col3       col4       col5
  1  Chr00c00001   1                   Chr1g00000491
  2  Chr00c00001   2
  3  Chr00c00001   3
  4  Chr00c00001   4    Chr1g00005011  Chr1g00000501

I am not familiar with an object of classes data.table and data.frame that apparently contains lists and some other lists inside the elements of the list. It would be great if someone could advise me on what kind of object I have and how to convert it into a format I can write into a text file.


得分: 1





obj <- data.table(col1=1:4, col2="c001", col3=11:14, col4=list(NULL, NULL, NULL, "5011"), col5=list("491", NULL, NULL, "501"))
#     col1   col2  col3   col4   col5
#    <int> <char> <int> <list> <list>
# 1:     1   c001    11           491
# 2:     2   c001    12              
# 3:     3   c001    13              
# 4:     4   c001    14   5011    501
Classes 'data.table' and 'data.frame':	4 obs. of  5 variables:
 $ col1: int  1 2 3 4
 $ col2: chr  "c001" "c001" "c001" "c001"
 $ col3: int  11 12 13 14
 $ col4:List of 4
  ..$ : NULL
  ..$ : NULL
  ..$ : NULL
  ..$ : chr "5011"
 $ col5:List of 4
  ..$ : chr "491"
  ..$ : NULL
  ..$ : NULL
  ..$ : chr "501"





  • 长度为0的元素应该是NA(实际上是NA_character_);
  • 长度为2+的元素应该扩展行数。


obj[, (islist) := lapply(.SD, function(z) replace(z, !sapply(z, length), NA)), .SDcols = islist]
#     col1   col2  col3   col4   col5
#    <int> <char> <int> <list> <list>
# 1:     1   c001    11     NA    491
# 2:     2   c001    12     NA     NA
# 3:     3   c001    13     NA     NA
# 4:     4   c001    14   5011    501


tidyr::unnest(obj, c(col4, col5))
# # A tibble: 4 × 5
#    col1 col2   col3 col4  col5 
#   <int> <chr> <int> <chr> <chr>
# 1     1 c001     11 <NA>  491  
# 2     2 c001     12 <NA>  <NA> 
# 3     3 c001     13 <NA>  <NA> 
# 4     4 c001     14 5011  501  



It's a notion of either (or both) "list-columns" and/or "nested data". It's relatively useful for many things but at the same time many functions that work great on non-nested data.frame-like objects do not know how to work with list-columns/nested data. This is for a reasonable reason: simple (non-list) columns are clearly just vectors, so anything that works on a vector works on a column of a frame. However, with list-columns, as long as the length of the list is the same as the number of rows in the frame, you can put anything into each element of that list-column. This includes NULL, arbitrary-length vectors, graphic-objects (grobs), other data.frame-like objects, arbitrarily-nested lists, etc.

In your case, though, it looks like your list-columns are length 0 or 1 vectors. The typical unnesting of this data might remove the rows with length-0, so we need to take a little care by replacing empty-elements with something reasonable, whether NA or an empty string (since your list-columns appear to be string-based).

I think your data looks similar to this:

obj &lt;- data.table(col1=1:4, col2=&quot;c001&quot;, col3=11:14, col4=list(NULL, NULL, NULL, &quot;5011&quot;), col5=list(&quot;491&quot;, NULL, NULL, &quot;501&quot;))
#     col1   col2  col3   col4   col5
#    &lt;int&gt; &lt;char&gt; &lt;int&gt; &lt;list&gt; &lt;list&gt;
# 1:     1   c001    11           491
# 2:     2   c001    12              
# 3:     3   c001    13              
# 4:     4   c001    14   5011    501
Classes &#39;data.table&#39; and &#39;data.frame&#39;:	4 obs. of  5 variables:
 $ col1: int  1 2 3 4
 $ col2: chr  &quot;c001&quot; &quot;c001&quot; &quot;c001&quot; &quot;c001&quot;
 $ col3: int  11 12 13 14
 $ col4:List of 4
  ..$ : NULL
  ..$ : NULL
  ..$ : NULL
  ..$ : chr &quot;5011&quot;
 $ col5:List of 4
  ..$ : chr &quot;491&quot;
  ..$ : NULL
  ..$ : NULL
  ..$ : chr &quot;501&quot;

I recognize that this sample explicitly shows NULL whereas yours does not. This changes nothing: if your data has &quot;&quot; instead of NULL, then my "guard against length 0" step will do no harm.

I think it's a safe thing first to confirm that what we have will reduce simply. That is, if any of the elements are length-0 (as I mentioned above), we need to make them length-1 with some sentinel value of emptiness. If any element is length 2 or more, though, it would suggest that that row would need to be repeated per that length. If this is known and desired behavior, then all is good; if not, you need to think about how to aggregate/reduce the data, e.g., min, mean, first, last, or sample.

Another note: you have two (or more) list-columns. The solution becomes a lot murkier if you have length > 1 in one list-column and a different length in another list-column. If they are both the same length, then we may be good with the assumption that two list-columns with length-n elements in the same rows should expand the same number of rows. However, if they are both length > 1 but different lengths, then ... do we do a cartesian expansion? Truncation? Lots of ways this can go wrong. (I won't "fix" this condition here.)

For now, I'll assume:

  • length 0 elements should be NA (actually NA_character_);
  • length 2+ elements should expand the number of rows.

Again, if your data are all length-1 vectors then this will do no harm.

obj[, (islist) := lapply(.SD, function(z) replace(z, !sapply(z, length), NA)), .SDcols = islist]
#     col1   col2  col3   col4   col5
#    &lt;int&gt; &lt;char&gt; &lt;int&gt; &lt;list&gt; &lt;list&gt;
# 1:     1   c001    11     NA    491
# 2:     2   c001    12     NA     NA
# 3:     3   c001    13     NA     NA
# 4:     4   c001    14   5011    501

From here, we can use tidyr::unnest:

tidyr::unnest(obj, c(col4, col5))
# # A tibble: 4 &#215; 5
#    col1 col2   col3 col4  col5 
#   &lt;int&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt;
# 1     1 c001     11 &lt;NA&gt;  491  
# 2     2 c001     12 &lt;NA&gt;  &lt;NA&gt; 
# 3     3 c001     13 &lt;NA&gt;  &lt;NA&gt; 
# 4     4 c001     14 5011  501  

Notice that this converted it from class data.table to class tbl_df; if you intend to continue using the data.table dialect of working on frames, then you'll need either as.data.table or setDT here.

  • 本文由 发表于 2023年4月17日 12:43:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76031777.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
