2023年5月7日 07:08:35go评论67阅读模式

英文:

R/Dplyr: Data Frame - Filling in NA/NULL values with 0, Columns for each row to be filled in is based on value from each rows specific value

问题

I am new to this community.
我是新来的。

I am trying to code in R but having a little trouble with my code.
我正在尝试在R中编写代码，但我的代码有点问题。

#Create dataframe with 5 rows and 3 columns
#创建包含5行和3列的数据框
my_dataframe=data.frame(Student_Id = c( '1012501', '1012502', '1012503', '1012504', '1012505', '1012506'),
Student_Nm = c( 'John Doe', 'John Smith', 'John Simon', 'Jane Powel', 'Jane Smith', 'Jane Simon'),
School_Cd = c( '808971', '808972', '808973', '808974', '808975', '808976'),
Grade = c(1,1,2,2,3,4),
Age = c(6,6,7,7,8,9),
Join_Month_Count = c(36,18,2,9,3,22),
New_Student_FL = c(0,0,1,0,1,0),
Subject = c( 'Math', 'Science', 'Geography', 'Social_Study', 'Music', 'Math'),
Mar_22 = c(NA,NA,NA,NA,NA,NA),
Apt_22 = c(NA,9,NA,NA,NA,2),
May_22 = c(12,NA,NA,NA,NA,9),
Jun_22 = c(NA,8,NA,NA,NA,8),
Jul_22 = c(1,NA,NA,8,NA,NA),
Aug_22 = c(NA,NA,NA,NA,NA,7),
Sep_22 = c(9,8,NA,NA,NA,NA),
Oct_22 = c(NA,NA,NA,8,NA,NA),
Nov_22 = c(26,NA,NA,NA,NA,4),
Dec_22 = c(NA,7,NA,NA,NA,NA),
Jan_23 = c(4,6,NA,8,NA,1),
Feb_23 = c(8,NA,4,NA,12,1),
Mar_23 = c(9,NA,NA,8,NA,5))

#Display dataframe
#显示数据框
print(my_dataframe)

The sample table is below:
示例表格如下：
R/Dplyr：数据框 – 用0填充NA/NULL值，要填充的列基于每行的特定值。

I am trying to fill in the data frame's column (9 to 21) existing NA or Null values with 0. from Column 9 to Column 21 (Mar_22 to Mar_23 in sample data).
1.我试图用0填充数据框的列（从9到21），将现有的NA或Null值替换。从第9列到第21列（在示例数据中为Mar_22到Mar_23）。
The formula needs to check what the value of the "Join_Month_Count" is and
2.该公式需要检查"Join_Month_Count"的值，并
subtract the "Join_Month_Count" value from 22 (total column count + 1 ) to get the target column index. if the index is less than 9 then the starting column index will be 9, else start from calculated index column to the last column and
3.将"Join_Month_Count"的值从22（总列数+1）中减去，以获得目标列索引。如果索引小于9，则起始列索引将为9，否则从计算的索引列开始到最后一列，并
Fill in any NA/NULL value to 0
4.将任何NA/NULL值填充为0

For example: In the sample dataset,
例如：在示例数据集中，
Row 1: Col 9 through col 21, NA will be filled with 0
行1：从第9列到第21列，NA将被填充为0
Row 2: Col 9 through col 21, NA will be filled with 0
行2：从第9列到第21列，NA将被填充为0
Row 3: Col 20 through col 21, NA will be filled with 0, every other columns will remain as it is
行3：从第20列到第21列，NA将被填充为0，其他列将保持不变
Row 4: Col 13 through col 21, NA will be filled with 0, every other columns will remain as it is
行4：从第13列到第21列，NA将被填充为0，其他列将保持不变
Row 5: Col 19 through col 21, NA will be filled with 0, every other columns will remain as it is
行5：从第19列到第21列，NA将被填充为0，其他列将保持不变
Row 6: Col 9 through col 21, NA will be filled with 0
行6：从第9列到第21列，NA将被填充为0

Currently, I am using a for loop as below, but it is not working.
目前，我正在使用如下的for循环，但它不起作用。

Code is filling all Na with 0.
代码正在用0填充所有的NA。

Is there an easier way to do it and faster?
是否有更简单且更快的方法来实现呢？

My R Code is as below.
我的R代码如下。

for(i in 1:nrow(my_dataframe)){
  ref_val = 22 - my_dataframe[i,6]    ## 6 is the index of the "Join_Month_Count" column
  
  if(ref_val < 9){          ## 9 is the column index of the month start
    ref_val = 9
  }

my_dataframe[ , ref_val:21][is.na(my_dataframe[ , ref_val:21])] <- 0
}

and this

Replace NA on multiple columns by Index

library("tidyr")
library("dplyr")
my_dataframe <- my_dataframe %>%
mutate(across(c(Sep_22:Mar_23),~ ifelse(is.na(.), 0, .)))

The Following Seems to work, but really Slow

for(i in 1:nrow(my_dataframe)){
ref_val = 22 - my_dataframe[i,6] ## 6 is the index of the "Join_Month_Count" column

if(ref_val < 9){ ## 9 is the column index of the month start

英文:

I am new to this community.

I am trying to code in R but having a little trouble with my code.

#Create dataframe with 5 rows and 3 columns
my_dataframe=data.frame(Student_Id = c( &#39;1012501&#39;, &#39;1012502&#39;, &#39;1012503&#39;, &#39;1012504&#39;, &#39;1012505&#39;, &#39;1012506&#39;), 	
                        Student_Nm = c( &#39;John Doe&#39;, &#39;John Smith&#39;, &#39;John Simon&#39;, &#39;Jane Powel&#39;, &#39;Jane Smith&#39;, &#39;Jane Simon&#39;), 	
                        School_Cd = c( &#39;808971&#39;, &#39;808972&#39;, &#39;808973&#39;, &#39;808974&#39;, &#39;808975&#39;, &#39;808976&#39;), 	
                        Grade = c(1,1,2,2,3,4), 	
                        Age = c(6,6,7,7,8,9), 	
                        Join_Month_Count = c(36,18,2,9,3,22), 	
                        New_Student_FL = c(0,0,1,0,1,0), 	
                        Subject = c( &#39;Math&#39;, &#39;Science&#39;, &#39;Geography&#39;, &#39;Social_Study&#39;, &#39;Music&#39;, &#39;Math&#39;), 	
                        Mar_22 = c(NA,NA,NA,NA,NA,NA), 	
                        Apt_22 = c(NA,9,NA,NA,NA,2), 	
                        May_22 = c(12,NA,NA,NA,NA,9), 	
                        Jun_22 = c(NA,8,NA,NA,NA,8), 	
                        Jul_22 = c(1,NA,NA,8,NA,NA), 	
                        Aug_22 = c(NA,NA,NA,NA,NA,7), 	
                        Sep_22 = c(9,8,NA,NA,NA,NA), 	
                        Oct_22 = c(NA,NA,NA,8,NA,NA), 	
                        Nov_22 = c(26,NA,NA,NA,NA,4), 	
                        Dec_22 = c(NA,7,NA,NA,NA,NA), 	
                        Jan_23 = c(4,6,NA,8,NA,1), 	
                        Feb_23 = c(8,NA,4,NA,12,1), 	
                        Mar_23 = c(9,NA,NA,8,NA,5))

#Display dataframe
print(my_dataframe)

The sample table is below:
R/Dplyr：数据框 – 用0填充NA/NULL值，要填充的列基于每行的特定值。

I am trying to fill in the data frame's column (9 to 21) existing NA or Null values with 0. from Column 9 to Column 21 (Mar_22 to Mar_23 in sample data).
The formula needs to check what the value of the "Join_Month_Count" is and
subtract the "Join_Month_Count" value from 22 (total column count + 1 ) to get the target column index. if the index is less than 9 then the starting column index will be 9, else start from calculated index column to the last column and
Fill in any NA/NULL value to 0

For example: In the sample dataset,
Row 1: Col 9 through col 21, NA will be filled with 0
Row 2: Col 9 through col 21, NA will be filled with 0
Row 3: Col 20 through col 21, NA will be filled with 0, every other columns will remain as it is
Row 4: Col 13 through col 21, NA will be filled with 0, every other columns will remain as it is
Row 5: Col 19 through col 21, NA will be filled with 0, every other columns will remain as it is
Row 6: Col 9 through col 21, NA will be filled with 0

Currently, I am using a for loop as below, but it is not working.
Code is filling all Na with 0.
Is there an easier way to do it and faster?

My R Code is as below.

for(i in 1:nrow(my_dataframe)){
  ref_val = 22 - my_dataframe[i,6]    ## 6 is the index of the &quot;Join_Month_Count&quot; column
  
  if(ref_val &lt; 9){          ## 9 is the column index of the month start
    ref_val = 9
  }

my_dataframe[ , ref_val:21][is.na(my_dataframe[ , ref_val:21])] &lt;- 0
}

and this 

# Replace NA on multiple columns by Index
library(&quot;tidyr&quot;)
library(&quot;dplyr&quot;)
my_dataframe &lt;- my_dataframe %&gt;% 
  mutate(across(c(Sep_22:Mar_23),~ ifelse(is.na(.), 0, .)))

The Following Seems to work, but really Slow


for(i in 1:nrow(my_dataframe)){
  ref_val = 22 - my_dataframe[i,6]    ## 6 is the index of the &quot;Join_Month_Count&quot; column
  
  if(ref_val &lt; 9){          ## 9 is the column index of the month start
    ref_val = 9
  }
  print(ref_val)
  my_dataframe[i, ref_val:21][is.na(my_dataframe[i, ref_val:21])] &lt;- 0
}

#Display dataframe
print(my_dataframe)

Thanks,
JC

答案1

得分: 1

这是一个基于R语言的方法。

tmp <- lapply(pmin(pmax(22 - my_dataframe$Join_Month_Count, 9), 21) - 8,
              function(z) replace(rep(NA, 13), z:13, 0))
tmp <- setNames(do.call(rbind.data.frame, tmp), names(my_dataframe)[9:21])
tmp
#   Mar_22 Apt_22 May_22 Jun_22 Jul_22 Aug_22 Sep_22 Oct_22 Nov_22 Dec_22 Jan_23 Feb_23 Mar_23
# 1      0      0      0      0      0      0      0      0      0      0      0      0      0
# 2      0      0      0      0      0      0      0      0      0      0      0      0      0
# 3     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      0      0
# 4     NA     NA     NA     NA      0      0      0      0      0      0      0      0      0
# 5     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      0      0      0
# 6      0      0      0      0      0      0      0      0      0      0      0      0      0

my_dataframe[,9:21] <- Map(dplyr::coalesce, my_dataframe[,9:21], tmp)
my_dataframe
#   Student_Id Student_Nm School_Cd Grade Age Join_Month_Count New_Student_FL      Subject Mar_22 Apt_22 May_22 Jun_22 Jul_22 Aug_22 Sep_22 Oct_22 Nov_22 Dec_22 Jan_23 Feb_23 Mar_23
# 1    1012501   John Doe    808971     1   6               36              0         Math      0      0     12      0      1      0      9      0     26      0      4      8      9
# 2    1012502 John Smith    808972     1   6               18              0      Science      0      9      0      8      0      0      8      0      0      7      6      0      0
# 3    1012503 John Simon    808973     2   7                2              1    Geography     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      4      0
# 4    1012504 Jane Powel    808974     2   7                9              0 Social_Study     NA     NA     NA     NA      8      0      0      8      0      0      8      0      8
# 5    1012505 Jane Smith    808975     3   8                3              1        Music     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      0     12      0
# 6    1012506 Jane Simon    808976     4   9               22              0         Math      0      2      9      8      0      7      0      0      4      0      1      1      5

英文:

Here's a base R approach.

tmp &lt;- lapply(pmin(pmax(22 - my_dataframe$Join_Month_Count, 9), 21) - 8,
              function(z) replace(rep(NA, 13), z:13, 0))
str(tmp, vec.len = 13)
# List of 6
#  $ : num [1:13] 0 0 0 0 0 0 0 0 0 0 0 0 0
#  $ : num [1:13] 0 0 0 0 0 0 0 0 0 0 0 0 0
#  $ : num [1:13] NA NA NA NA NA NA NA NA NA NA NA 0 0
#  $ : num [1:13] NA NA NA NA 0 0 0 0 0 0 0 0 0
#  $ : num [1:13] NA NA NA NA NA NA NA NA NA NA 0 0 0
#  $ : num [1:13] 0 0 0 0 0 0 0 0 0 0 0 0 0
tmp &lt;- setNames(do.call(rbind.data.frame, tmp), names(my_dataframe)[9:21])
tmp
#   Mar_22 Apt_22 May_22 Jun_22 Jul_22 Aug_22 Sep_22 Oct_22 Nov_22 Dec_22 Jan_23 Feb_23 Mar_23
# 1      0      0      0      0      0      0      0      0      0      0      0      0      0
# 2      0      0      0      0      0      0      0      0      0      0      0      0      0
# 3     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      0      0
# 4     NA     NA     NA     NA      0      0      0      0      0      0      0      0      0
# 5     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      0      0      0
# 6      0      0      0      0      0      0      0      0      0      0      0      0      0

This first step creates a same-shape data.frame of the columns we need (9:21), with 0 if we want to replace NAs with 0, and NA otherwise.

With this, we can use dplyr::coalesce column-wise on the columns needed:

my_dataframe[,9:21] &lt;- Map(dplyr::coalesce, my_dataframe[,9:21], tmp)
my_dataframe
#   Student_Id Student_Nm School_Cd Grade Age Join_Month_Count New_Student_FL      Subject Mar_22 Apt_22 May_22 Jun_22 Jul_22 Aug_22 Sep_22 Oct_22 Nov_22 Dec_22 Jan_23 Feb_23 Mar_23
# 1    1012501   John Doe    808971     1   6               36              0         Math      0      0     12      0      1      0      9      0     26      0      4      8      9
# 2    1012502 John Smith    808972     1   6               18              0      Science      0      9      0      8      0      0      8      0      0      7      6      0      0
# 3    1012503 John Simon    808973     2   7                2              1    Geography     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      4      0
# 4    1012504 Jane Powel    808974     2   7                9              0 Social_Study     NA     NA     NA     NA      8      0      0      8      0      0      8      0      8
# 5    1012505 Jane Smith    808975     3   8                3              1        Music     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      0     12      0
# 6    1012506 Jane Simon    808976     4   9               22              0         Math      0      2      9      8      0      7      0      0      4      0      1      1      5

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R/Dplyr：数据框 – 用0填充NA/NULL值，要填充的列基于每行的特定值。

问题

Replace NA on multiple columns by Index

答案1

如何在 vscode 界面上以交互方式查看 R 中的颜色名称？

Preventing overplotting for specific data points in ggplot2 scatterplot.

如果模式在gsub中匹配，则仅返回值。

如何使用dplyr合并具有不同行的多个数据框。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论