导入CSV到R并删除开头和中间的注释行。

huangapple go评论101阅读模式
英文:

Importing CSV to r and remove the rows of notes both at begining and middle

问题

我有几个由空气传感器(TSI Bluesky和AirAssure)记录的CSV文件。该设备将数据记录到其SD卡上。与许多由机器记录的文件一样,前59行是以#开头的注释,用于记录基本信息,如序列号。通过添加skip=59可以轻松跳过这些注释。然而,这些注释可能会在CSV文件中间出现,打破了记录。与此同时,列名将再次重复。我有以下示例。

#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3
#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3

如何跳过所有的noteunit,只保留一个列名和所有的数字?

英文:

I have several csv files recorded by air sensor (TSI Bluesky and AirAssure). This device records the data to its SD card. As with many machine-recorded files, the first 59 lines are notes that start with # to record basic information like serial numbers. These notes are easy to skip by adding skip=59. However, these notes could appear in the middle of the csv files by breaking the record. Meanwhile, the column names will repeat again. I have an example below.

#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3
#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3

How can I skip all the note and unit and only keep one column name and all the numbers?

答案1

得分: 2

这段代码从文本中读取数据,所以如果你从某个文件夹加载CSV文件,请检查分隔符是否为"\t"或" "。

comment.char 参数用于过滤注释行:#note

  1. text <-
  2. "
  3. #note
  4. #note
  5. #note
  6. #note
  7. col1 col2 col3
  8. unit1 unit2 unit3
  9. 1 2 3
  10. 1 2 3
  11. 1 2 3
  12. #note
  13. #note
  14. #note
  15. #note
  16. col1 col2 col3
  17. unit1 unit2 unit3
  18. 1 2 3
  19. 1 2 3
  20. 1 2 3
  21. "
  22. library(dplyr)
  23. df <- read.csv(text = text, comment.char = "#", sep = "\t")
  24. filter(df, !col1 %in% c('col1', 'unit1'))

输出:

  1. col1 col2 col3
  2. 1 1 1 2 3
  3. 2 2 1 2 3
  4. 3 3 1 2 3
  5. 4 4 1 2 3
  6. 5 5 1 2 3
  7. 6 6 1 2 3
英文:

This code reads data from text, so if you are loading the csv file from some a folder, please check that the separator is "\t" or " "

The comment.char parameter filters the notes: #note

  1. text &lt;-
  2. &quot;
  3. #note
  4. #note
  5. #note
  6. #note
  7. col1 col2 col3
  8. unit1 unit2 unit3
  9. 1 2 3
  10. 1 2 3
  11. 1 2 3
  12. #note
  13. #note
  14. #note
  15. #note
  16. col1 col2 col3
  17. unit1 unit2 unit3
  18. 1 2 3
  19. 1 2 3
  20. 1 2 3
  21. &quot;
  22. library(dplyr)
  23. df &lt;- read.csv(text = text, comment.char = &quot;#&quot;, sep = &quot;\t&quot;)
  24. filter(df, !col1 %in% c(&#39;col1&#39;, &#39;unit1&#39;))

Output:

> col1 col2 col3
> 1 1 2 3
> 2 1 2 3
> 3 1 2 3
> 4 1 2 3
> 5 1 2 3
> 6 1 2 3

huangapple
  • 本文由 发表于 2023年7月7日 03:41:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76632089.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定