读取包含多个ID的数据文件并保存为不同的CSV文件。

huangapple go评论73阅读模式
英文:

reading data file containing multiple ID's into different csvs

问题

以下是您要翻译的内容:

给定以下数据结构的文件:

FIXED=0
LINES=1
POINTS=5
390 397
390 396
389 395
389 394
388 393
IMAGE=Name1.jpg
ID=1 
FIXED=0
LINES=1
POINTS=4
255 503
256 502
256 501
256 500
IMAGE=Name2.jpg
ID=2 
FIXED=0
LINES=1
POINTS=6
262 431
262 430
262 429
262 428
262 427
262 426
IMAGE=Name3.jpg
ID=3 

其中:

  • FIXEDID 之间的行属于一个个体
  • 数字表示两列变量

我们如何读取数据,然后转换为单独的 .csv 文件,其中:

  • 每个 .csv 的名称是 IMAGE= 后面的名称,即 Name1, Name2, Name3...
  • Name1.csv 的第一列数据是数字的第一列(390 390 389 389 388
  • Name1.csv 的第二列数据是数字的第二列(397 396 395 394 393
  • 对于 Name2.csvName3.csv 等都是相同的方式
  • FIXED=0LINES=1POINTS=5ID=1 可以忽略不计

请注意,POINTSIMAGE 之间的行数不固定。

英文:

Given a file with the following data structure:

FIXED=0
LINES=1
POINTS=5
390 397
390 396
389 395
389 394
388 393
IMAGE=Name1.jpg
ID=1 
FIXED=0
LINES=1
POINTS=4
255 503
256 502
256 501
256 500
IMAGE=Name2.jpg
ID=2 
FIXED=0
LINES=1
POINTS=6
262 431
262 430
262 429
262 428
262 427
262 426
IMAGE=Name3.jpg
ID=3 

Were:

  • The lines between FIXED and ID belong to an individual
  • The numbers represent two columns of variables

How would we read in the data and then transform into individual .csv files were:

  • The name of each .csv is the line after IMAGE= Name1, Name2, Name3...
  • First column of data of Name1.csv is the first column of numbers (390 390 389 389 388)
  • Second column of data of Name1.csv is the second column of numbers (397 396 395 394 393)
  • The same for Name2.csv, Name3.csv....
  • FIXED=0, LINES=1, POINTS=5, ID=1 can be dispensed

Please note that the number of rows between POINTS and IMAGE is not contant

答案1

得分: 1

这是你可以尝试的方法:

library(stringr)

# 从文件中读取数据
data <- readLines("your_file.txt")

# 初始化变量
current_individual <- NULL
current_points <- NULL
current_data <- NULL

# 处理数据的每一行
for (line in data) {
  # 检查行是否以"IMAGE="开头
  if (str_starts(line, "IMAGE=")) {
    # 从行中提取个体名称
    individual_name <- str_remove(line, "IMAGE=")
    individual_name <- str_remove(individual_name, ".jpg")
    
    # 如果存在数据,将其保存到CSV文件中
    if (!is.null(current_individual) && !is.null(current_data)) {
      csv_file <- paste0(current_individual, ".csv")
      write.csv(current_data, file = csv_file, row.names = FALSE)
    }
    
    # 初始化新个体的变量
    current_individual <- individual_name
    current_points <- NULL
    current_data <- NULL
    
  } else if (str_starts(line, "POINTS=")) {
    # 从行中提取点数
    num_points <- as.numeric(str_remove(line, "POINTS="))
    
    # 初始化点数的变量
    current_points <- num_points
    current_data <- matrix(nrow = num_points, ncol = 2)
    
  } else if (str_detect(line, "\\d+ \\d+")) {
    # 从行中提取两个数字
    numbers <- str_split(line, " ")[[1]]
    
    # 将数字添加到当前数据中
    current_data <- rbind(current_data, as.numeric(numbers))
  }
}

# 将最后一个个体的数据保存到CSV文件中
if (!is.null(current_individual) && !is.null(current_data)) {
  csv_file <- paste0(current_individual, ".csv")
  write.csv(current_data, file = csv_file, row.names = FALSE)
}

这是你提供的R代码的翻译部分。

英文:

You could try this method:

library(stringr)

# Read the data from file
data &lt;- readLines(&quot;your_file.txt&quot;)

# Initialize variables
current_individual &lt;- NULL
current_points &lt;- NULL
current_data &lt;- NULL

# Process each line of the data
for (line in data) {
  # Check if the line starts with &quot;IMAGE=&quot;
  if (str_starts(line, &quot;IMAGE=&quot;)) {
    # Extract the individual name from the line
    individual_name &lt;- str_remove(line, &quot;IMAGE=&quot;)
    individual_name &lt;- str_remove(individual_name, &quot;.jpg&quot;)
    
    # If there is existing data, save it to a CSV file
    if (!is.null(current_individual) &amp;&amp; !is.null(current_data)) {
      csv_file &lt;- paste0(current_individual, &quot;.csv&quot;)
      write.csv(current_data, file = csv_file, row.names = FALSE)
    }
    
    # Initialize variables for the new individual
    current_individual &lt;- individual_name
    current_points &lt;- NULL
    current_data &lt;- NULL
    
  } else if (str_starts(line, &quot;POINTS=&quot;)) {
    # Extract the number of points from the line
    num_points &lt;- as.numeric(str_remove(line, &quot;POINTS=&quot;))
    
    # Initialize variables for the points
    current_points &lt;- num_points
    current_data &lt;- matrix(nrow = num_points, ncol = 2)
    
  } else if (str_detect(line, &quot;\\d+ \\d+&quot;)) {
    # Extract the two numbers from the line
    numbers &lt;- str_split(line, &quot; &quot;)[[1]]
    
    # Append the numbers to the current data
    current_data &lt;- rbind(current_data, as.numeric(numbers))
  }
}

# Save the last individual&#39;s data to a CSV file
if (!is.null(current_individual) &amp;&amp; !is.null(current_data)) {
  csv_file &lt;- paste0(current_individual, &quot;.csv&quot;)
  write.csv(current_data, file = csv_file, row.names = FALSE)
}

huangapple
  • 本文由 发表于 2023年5月17日 17:51:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76270773.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定