问题

我有各种 .csv 文件。每个文件都有多列。我正在使用提供的 R 代码进行质量检查，针对特定列，查看有多少行具有有效值，有多少行为空值。该代码在单个 csv 文件上运行良好。但我想对所有 csv 文件运行此代码，并需要每个 csv 文件的输出。此外，我想要一个日志文件。请问有谁能帮我修改代码，使其能够处理各种 csv 文件。

install.packages("readr") 
library(readr)
check_column <- function(df, column) {
  valid_values <- !is.na(df[[column]])
  num_valid <- sum(valid_values)
  num_null <- nrow(df) - num_valid
  return(c(num_valid, num_null))
}
# 读取 CSV 文件
df <- read_csv("data.csv")
for (column in names(df)) {
  results <- check_column(df, column)
  print(paste(column, ": ", results[1], " 有效, ", results[2], " 空值"))
}

示例数据：（并非所有文件具有相同数量的列）

Csv1.csv

D_T  Temp (°C)  Press (Pa)  ...
2021-03-01 00:00:00+00  28  1018  ...
2021-03-02 00:00:00+00  27  1017  ...
2021-03-03 00:00:00+00  28  1019  ...
..
..
Csv2.csv

D_T Temp (°C) Vel (m/s) Press (Pa)...
2022-03-01 00:00:00+00 28 118 1018 ...
2022-03-02 00:00:00+00 27 117 1019 ...
2022-03-03 00:00:00+00 28 119 1018 ...
..
..


<details>
<summary>英文:</summary>
I have various .csv files. Each file has multiple columns. I am using the given code in R to pursue a quality check that for a particular column, how many rows have valid values and how many are null. The code works well for a single csv file. But I want to run the code for all the csv files and need output for each csv file. Additionally, I want a log file. Could anyone please help me by modifying the code how it can be used to process various csv files.
    install.packages(&quot;readr&quot;) 
    library(readr)
    check_column &lt;- function(df, column) {
      valid_values &lt;- !is.na(df[[column]])
      num_valid &lt;- sum(valid_values)
      num_null &lt;- nrow(df) - num_valid
      return(c(num_valid, num_null))
    }
    #Read the CSV file
    df &lt;- read_csv(&quot;data.csv&quot;)
    
    for (column in names(df)) {
      results &lt;- check_column(df, column)
      print(paste(column, &quot;: &quot;, results[1], &quot; valid, &quot;, results[2], &quot; null&quot;))
    }
Sample data: (Not all files have same number of columns)
Csv1.csv
    D_T  Temp (&#176;C)  Press (Pa)  ...
    2021-03-01 00:00:00+00  28  1018  ...
    2021-03-02 00:00:00+00  27  1017  ...
    2021-03-03 00:00:00+00  28  1019  ...
    ..
    .. 
Csv2.csv
    D_T  Temp (&#176;C)  Vel (m/s)  Press (Pa_...
    2022-03-01 00:00:00+00  28  118  1018  ...
    2022-03-02 00:00:00+00  27  117  1019  ...
    2022-03-03 00:00:00+00  28  119  1018  ...
    ..
    .. 
</details>
# 答案1
**得分**: 1
以下是翻译好的代码部分：
```R
如何像这样做呢？这将不会在一个变量中存储任何内容。如果您需要帮助，请告诉我。
library(readr)
for(files in list.files(pattern=".*csv$")) {
    file <- read_csv(files)
    out <- file(paste0(files, ".log"), open = "w")
    sapply(colnames(file), function(x) {
            cat(
                    paste0(x, ":"),
                    sum(!is.na(file[, x])),
                    "valid,",
                    sum(is.na(file[, x])),
                    "null\n",
                    file = out
            )
    })
    close(out)
}
要写入一个文件中：
library(readr)
out <- file("output.log", open = "w")
for(files in list.files(pattern=".*csv$")) {
        file <- read_csv(files)
        cat(files, "\n", file = out)
        sapply(colnames(file), function(x) {
                cat(
                        paste0(x, ":"),
                        sum(!is.na(file[, x])),
                        "valid,",
                        sum(is.na(file[, x])),
                        "null\n",
                        file = out
                )
        })   
}
close(out)

希望这对您有帮助。

英文:

How about something like this? This will not store anything in a variable. Let me know if you need help with it.

library(readr)    
    
for(files in list.files(pattern=&quot;.*csv$&quot;)) {
    file &lt;- read_csv(files)
    out &lt;- file(paste0(files, &quot;.log&quot;), open = &quot;w&quot;)
    sapply(colnames(file), function(x) {
            cat(
                    paste0(x, &quot;:&quot;),
                    sum(!is.na(file[, x])),
                    &quot;valid,&quot;,
                    sum(is.na(file[, x])),
                    &quot;null\n&quot;,
                    file = out
            )
    })
    close(out)
}

To write into one file only:

library(readr)    
out &lt;- file(&quot;output.log&quot;, open = &quot;w&quot;)
for(files in list.files(pattern=&quot;.*csv$&quot;)) {
        file &lt;- read_csv(files)
        cat(files, &quot;\n&quot;, file = out)
        sapply(colnames(file), function(x) {
                cat(
                        paste0(x, &quot;:&quot;),
                        sum(!is.na(file[, x])),
                        &quot;valid,&quot;,
                        sum(is.na(file[, x])),
                        &quot;null\n&quot;,
                        file = out
                )
        })   
}
close(out)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中如何处理多个csv文件以识别空值？

问题

自动排除单层因子变量的回归。

Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)

理解为什么tune::last_fit的指标与summary()不同。

“KeyError: ‘cut’ not found in axis”

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。