2023年5月17日 23:30:35go评论90阅读模式

英文:

How to resolve error 'Class attribute on column 1 of different items do not match' in R

问题

I have various .txt files stored in multiple folders. The txt files have various columns, one of which is Temperature. Few files have temperature column name as T2 [°C] while others have it as T2 [?C]. I want to keep the temperature column name as T2 [°C] in all the files. I do not want to change the names of other columns. Also, the number of columns in all the files is not the same. (e.g. Few files have columns such as Pressure, Temperature, Radiation, Wind velocity, Wind direction and other files have only Pressure, Temperature and Radiation. It can be thought of as a case of missing data. Missing columns can be added with NA values. To fix the problem of Temperature column name and number of columns, I am using the following code in R but in the end, it gives me an error: Error in rbindlist(dt.tidied, fill = TRUE) : Class attribute column 1 of item 142 does not match with column 1 of item 23.'

install.packages("data.table")
library(data.table)
# List of files
filelist <- list.files("C:/Users/Akanksha/Desktop/BSRN/Test_Gz", full.names = TRUE, recursive = TRUE, pattern = ".txt$")
# Read the files
dt <- lapply(filelist, fread, skip = 27)
# Adjust Column names
dt.tidied <- lapply(dt, FUN = function(x){
  # Adjust ? to degree
  setnames(x, old = "T2 [?C]", new = "T2 [°C]", skip_absent = TRUE)
  colnames(x) <- gsub("\\[", "(", colnames(x))
  colnames(x) <- gsub("\\]", ")", colnames(x))
  # return
  return(x)
})
# Bind, filling missing columns to NA
merged <- rbindlist(dt.tidied, fill = TRUE, use.names = TRUE)

I tried to check the class attribute and found the following response. Both returns same answers, then I do not understand what is causing the error. Can anyone please help me.

class(dt.tidied[[23]][1])
[1] "data.table" "data.frame"
class(dt.tidied[[142]][1])
[1] "data.table" "data.frame"

d1=dput(dt.tidied[[23]])
structure(list(V1 = c(NA, NA, NA), V2 = c("SRad(SRAD)", "Temp [?C] (TT)", "Temp QCode (TTC)")), row.names = c(NA, -3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: (0x00000152b22fe7b0)>)
d1=dput(dt.tidied[[142]])
956.902, 961.01, 965.114)), row.names = c(NA, -44615L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001afc82f7590>) #The result of dput(dt.tidied[[142]] was too large, I am unable to see the initial lines, hence, I am pasting the last few lines of the result.

Also, the code is giving me following error after dt <- lapply(...)

Error in FUN(X[[i]], ...) : skip=27 but the input only has 25 lines
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Edit update: I checked my data and found out that I need to skip different number of rows in different txt files. Could it be the reason which is causing the error? And how to fix it? One way I can think of is to read the files from the line next to */ because next line to */ is the header and then data starts. It is common with all the files. Kindly help.

英文:

install.packages(&quot;data.table&quot;)
library(data.table)
#List of files
filelist &lt;- list.files(&quot;C:/Users/Akanksha/Desktop/BSRN/Test_Gz&quot;, full.names = TRUE, recursive 
= TRUE, pattern = &quot;.txt$&quot;)
#Read the files
dt &lt;- lapply(filelist, fread, skip = 27)
#Adjust Column names
dt.tidied &lt;- lapply(dt, FUN = function(x){
  #adjust ? to degree
  setnames(x, old = &quot;T2 [?C]&quot;, new = &quot;T2 [&#176;C]&quot;, skip_absent = TRUE)
  colnames(x) &lt;- gsub(&quot;\\[&quot;, &quot;(&quot;, colnames(x))
  colnames(x) &lt;- gsub(&quot;\\]&quot;, &quot;)&quot;, colnames(x))
  #return
  return(x)
})
#bind, filling missing columns to NA
merged &lt;- rbindlist(dt.tidied, fill = TRUE, use.names = TRUE)

I tried to check the class attribute and found the following response. Both returns same answers, then I do not understand what is causing the error. Can anyone please help me.

&gt; class(dt.tidied[[23]][1])
[1] &quot;data.table&quot; &quot;data.frame&quot;
&gt; class(dt.tidied[[142]][1])
[1] &quot;data.table&quot; &quot;data.frame&quot;
&gt; d1=dput(dt.tidied[[23]])
structure(list(V1 = c(NA, NA, NA), V2 = c(&quot;SRad(SRAD)&quot;, 
&quot;Temp [?C] (TT)&quot;, &quot;Temp QCode (TTC)&quot;
)), row.names = c(NA, -3L), class = c(&quot;data.table&quot;, &quot;data.frame&quot;
), .internal.selfref = &lt;pointer: (0x00000152b22fe7b0)&gt;)
&gt; d1=dput(dt.tidied[[142]])
956.902, 961.01, 965.114)), row.names = c(NA, -44615L), class = 
c(&quot;data.table&quot;, &quot;data.frame&quot;), .internal.selfref = &lt;pointer: 
0x000001afc82f7590&gt;)    #The result of dput(dt.tidied[[142]] was too 
large, I am unable to see the initial lines, hence, I am pasting the 
last few lines of the result.

Also, the code is giving me following error after dt <- lapply(...)

Error in FUN(X[[i]], ...) : skip=27 but the input only has 25 lines
In addition: There were 50 or more warnings (use warnings() to see the 
first 50)

答案1

得分: 1

The dput提供的dt.tidied[[142]]是不完整的，所以我将解释可能发生的情况并提供一个示例。假设您的d1和d2数据表如下所示：

library(lubridate)
d1 = structure(
  list(V1 = c(NA, NA, NA),
       V2 = c("SRad(SRAD)","Temp [?C] (TT)", "Temp QCode (TTC)")),
  row.names = c(NA, -3L),
  class = c("data.table", "data.frame")
)
d2 = structure(
  list(V1 = c(NA, NA, NA),
       V2 = c(956.902, 961.01, 965.114)),
  row.names = c(NA, -44615L),
  class = c("data.table", "data.frame")
)

我们将创建一个列表来容纳这两个数据表，并将它们存储在列表的第1和第2个元素中：

dt.tidied.working <- vector("list", length = 2)
dt.tidied.working[[1]] <- d1
dt.tidied.working[[2]] <- d2

现在，我们可以检查每个列表元素的第1列的类属性：

> class(dt.tidied.working[[1]]$V1)
[1] "logical"
> class(dt.tidied.working[[2]]$V1)
[1] "logical"

因此，我们可以看到在这个非常简单的示例中，这两列都具有相同的类属性，即“logical”，所以当我们使用rbindlist时，您不会遇到您遇到的错误。

merged <- rbindlist(dt.tidied.working, fill = TRUE, use.names = TRUE)
 V1               V2
1: NA       SRad(SRAD)
2: NA   Temp [?C] (TT)
3: NA Temp QCode (TTC)
4: NA          956.902
5: NA           961.01
6: NA          965.114

现在让我们改变第二个数据表的类属性，以反映根据您迄今为止的评论，我认为您的完整数据表可能包含的内容：

## not working example
d1 = structure(
  list(V1 = c(NA, NA, NA),
       V2 = c("SRad(SRAD)","Temp [?C] (TT)", "Temp QCode (TTC)")),
  row.names = c(NA, -3L),
  class = c("data.table", "data.frame")
)
d2 = structure(
  list(V1 = ymd_hms(c("2004-08-01T00:00:00","2004-08-02T00:00:00","2004-08-03T00:00:00")),
       V2 = c(956.902, 961.01, 965.114)),
  row.names = c(NA, -44615L),
  class = c("data.table", "data.frame")
)
dt.tidied.not.working <- vector("list", length = 2)
dt.tidied.not.working[[1]] <- d1
dt.tidied.not.working[[2]] <- d2

现在我们将再次检查类属性，就像在先前的示例中一样，您将看到这两列现在具有不同的属性，当我们尝试rbind.list时，您将收到相同的错误。

> class(dt.tidied.not.working[[1]]$V1)
[1] "logical"
> class(dt.tidied.not.working[[2]]$V1)
[1] "POSIXct" "POSIXt"
> merged <- rbindlist(dt.tidied.not.working, fill = TRUE, use.names = TRUE)
Error in rbindlist(dt.tidied.not.working, fill = TRUE, use.names = TRUE) : 
  Class attribute on column 1 of item 2 does not match with column 1 of item 1.

潜在解决方案

根据您提供的dt.tidied[[23]]的dput，看起来您在应该是ymd_hms值的地方有NA值。这可能是您遇到的问题。如果是这样，您需要检查包含您读取的文件的dt列表的元素，以查看哪些文件在列V1中具有NA。如果确实是问题，您将需要确定如何处理这些观测值（删除它们或通过分配日期/时间值来修复它们）。

另外，您在问题中提供的d1=dput(dt.tidied[[142]])输出不完整，因为您没有像您对d1=dput(dt.tidied[[23]])那样包括structure(...。请提供完整和准确的信息，这样我们可以更容易地帮助您。

如果您不想检查dt列表的每个单独元素，您可以使用plyr库的rbind.fill函数：

library(plyr)
dt.tidied.not.working <- lapply(dt.tidied.not.working, function(x) as.data.frame(x))
merged <- plyr::rbind.fill(dt.tidied.not.working)
> merged
           V1               V2
1        NA       SRad(SRAD)
2        NA   Temp [?C] (TT)
3        NA Temp QCode (TTC)
4 1.091e+09          956.902
5 1.091e+09           961.01
6 1.091e+09          965.114

英文:

The dput you provided for dt.tidied[[142]] was incomplete so I will explain what could be happening with an example. Let's assume your d1 and d2 data.tables look like this:

library(lubridate)
d1 = structure(
  list(V1 = c(NA, NA, NA),
       V2 = c(&quot;SRad(SRAD)&quot;,&quot;Temp [?C] (TT)&quot;, &quot;Temp QCode (TTC)&quot;)),
  row.names = c(NA, -3L),
  class = c(&quot;data.table&quot;, &quot;data.frame&quot;)
)
d2 = structure(
  list(V1 = c(NA, NA, NA),
       V2 = c(956.902, 961.01, 965.114)),
  row.names = c(NA, -44615L),
  class = c(&quot;data.table&quot;, &quot;data.frame&quot;)
)

We'll make a list to hold these two data.tables and store them in element 1 and 2 of the list:

dt.tidied.working &lt;- vector(&quot;list&quot;, length = 2)
dt.tidied.working[[1]] &lt;- d1
dt.tidied.working[[2]] &lt;- d2

Now we can check the class attribute of the column 1 for each list element:

&gt; class(dt.tidied.working[[1]]$V1)
[1] &quot;logical&quot;
&gt; class(dt.tidied.working[[2]]$V1)
[1] &quot;logical&quot;

So we can see that in this very simple example that both columns have the same class, logical, so when we use rbindlist we won't get the error you're getting.

merged &lt;- rbindlist(dt.tidied.working, fill = TRUE, use.names = TRUE)
 V1               V2
1: NA       SRad(SRAD)
2: NA   Temp [?C] (TT)
3: NA Temp QCode (TTC)
4: NA          956.902
5: NA           961.01
6: NA          965.114

Now let's change the class attribute for the second data.table to what I think your full data.table might contain based on your comments so far:

## not working example
d1 = structure(
  list(V1 = c(NA, NA, NA),
       V2 = c(&quot;SRad(SRAD)&quot;,&quot;Temp [?C] (TT)&quot;, &quot;Temp QCode (TTC)&quot;)),
  row.names = c(NA, -3L),
  class = c(&quot;data.table&quot;, &quot;data.frame&quot;)
)
d2 = structure(
  list(V1 = ymd_hms(c(&quot;2004-08-01T00:00:00&quot;,&quot;2004-08-02T00:00:00&quot;,&quot;2004-08-03T00:00:00&quot;)),
       V2 = c(956.902, 961.01, 965.114)),
  row.names = c(NA, -44615L),
  class = c(&quot;data.table&quot;, &quot;data.frame&quot;)
)
dt.tidied.not.working &lt;- vector(&quot;list&quot;, length = 2)
dt.tidied.not.working[[1]] &lt;- d1
dt.tidied.not.working[[2]] &lt;- d2

Now we'll check the class attributes again as in the previous example and you'll see that the two columns now have different attributes and when we try to rbind.list you'll get the same error.

&gt; class(dt.tidied.not.working[[1]]$V1)
[1] &quot;logical&quot;
&gt; class(dt.tidied.not.working[[2]]$V1)
[1] &quot;POSIXct&quot; &quot;POSIXt&quot;
&gt; merged &lt;- rbindlist(dt.tidied.not.working, fill = TRUE, use.names = TRUE)
Error in rbindlist(dt.tidied.not.working, fill = TRUE, use.names = TRUE) : 
  Class attribute on column 1 of item 2 does not match with column 1 of item 1.

Potential Solution

Based on your dput of dt.tidied[[23]] it appears that you have NA values where you should have ymd_hms values. This could be the problem you're running into. If that's the case then what you can do is inspect elements of the dt list that contains the files you read in to see which files have NA in column V1. If that is in fact the issue, you will need to figure out what to do with those observations (remove them or fix them by assigning the date/time values).

Also, the d1=dput(dt.tidied[[142]]) output that you provided in your question is incomplete as you did not include structure(... the same way you did for d1=dput(dt.tidied[[23]]). Please provide complete and accurate information as it makes it easier for us to help you.

If you don't want to inspect every single element of your dt list, you can use the rbind.fill function from the plyr library:

library(plyr)
dt.tidied.not.working &lt;- lapply(dt.tidied.not.working, function(x) as.data.frame(x))
merged &lt;- plyr::rbind.fill(dt.tidied.not.working)
&gt; merged
         V1               V2
1        NA       SRad(SRAD)
2        NA   Temp [?C] (TT)
3        NA Temp QCode (TTC)
4 1.091e+09          956.902
5 1.091e+09           961.01
6 1.091e+09          965.114

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何解决R中的错误’不匹配的不同项目的第1列的类属性’

问题

答案1

在R中进行文本挖掘：删除每个文档的第一句话

检查Makefile中的GNU扩展…警告

在x轴上绘制日期并自定义标签。

使用Openxlsx在R中对整个工作表进行条件格式设置

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。