导入CSV到R并删除开头和中间的注释行。

huangapple go评论72阅读模式
英文:

Importing CSV to r and remove the rows of notes both at begining and middle

问题

我有几个由空气传感器(TSI Bluesky和AirAssure)记录的CSV文件。该设备将数据记录到其SD卡上。与许多由机器记录的文件一样,前59行是以#开头的注释,用于记录基本信息,如序列号。通过添加skip=59可以轻松跳过这些注释。然而,这些注释可能会在CSV文件中间出现,打破了记录。与此同时,列名将再次重复。我有以下示例。

#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3
#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3

如何跳过所有的noteunit,只保留一个列名和所有的数字?

英文:

I have several csv files recorded by air sensor (TSI Bluesky and AirAssure). This device records the data to its SD card. As with many machine-recorded files, the first 59 lines are notes that start with # to record basic information like serial numbers. These notes are easy to skip by adding skip=59. However, these notes could appear in the middle of the csv files by breaking the record. Meanwhile, the column names will repeat again. I have an example below.

#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3
#note
#note
#note
#note
col1 col2 col3
unit1 unit2 unit3
1 2 3
1 2 3
1 2 3

How can I skip all the note and unit and only keep one column name and all the numbers?

答案1

得分: 2

这段代码从文本中读取数据,所以如果你从某个文件夹加载CSV文件,请检查分隔符是否为"\t"或" "。

comment.char 参数用于过滤注释行:#note

text <- 
"
#note		
#note		
#note		
#note		
col1	col2	col3
unit1	unit2	unit3
1	2	3
1	2	3
1	2	3
#note		
#note		
#note		
#note		
col1	col2	col3
unit1	unit2	unit3
1	2	3
1	2	3
1	2	3
"
library(dplyr)

df <- read.csv(text = text, comment.char = "#", sep = "\t")
filter(df, !col1 %in% c('col1', 'unit1'))

输出:

   col1 col2 col3
1    1    1    2    3
2    2    1    2    3
3    3    1    2    3
4    4    1    2    3
5    5    1    2    3
6    6    1    2    3
英文:

This code reads data from text, so if you are loading the csv file from some a folder, please check that the separator is "\t" or " "

The comment.char parameter filters the notes: #note

text &lt;- 
&quot;
#note		
#note		
#note		
#note		
col1	col2	col3
unit1	unit2	unit3
1	2	3
1	2	3
1	2	3
#note		
#note		
#note		
#note		
col1	col2	col3
unit1	unit2	unit3
1	2	3
1	2	3
1	2	3
&quot;
library(dplyr)

df &lt;- read.csv(text = text, comment.char = &quot;#&quot;, sep = &quot;\t&quot;)
filter(df, !col1 %in% c(&#39;col1&#39;, &#39;unit1&#39;))

Output:

> col1 col2 col3
> 1 1 2 3
> 2 1 2 3
> 3 1 2 3
> 4 1 2 3
> 5 1 2 3
> 6 1 2 3

huangapple
  • 本文由 发表于 2023年7月7日 03:41:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76632089.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定