英文:
split the text for a fixed length considering the delimiter
问题
将文本按固定长度200分割,考虑分隔符,在此示例中将分隔符视为空格。以下是示例数据,我想基于text
变量派生col, col1, col2
。
df <- data.frame(id=1,text='INDEXC函数从左到右搜索源字符串,在源字符串中找到的第一个出现在摘要中的任何字符,并返回该字符在源字符串中的位置。如果在源字符串中未找到摘要1到摘要n中的任何字符,INDEXC将返回值0。REVERSE函数创建反向写入,使得参数中的最后一个字符成为结果中的第一个字符,参数中的倒数第二个字符成为结果中的第二个字符,依此类推。下面的宏结合了INDEXC/REVERSE函数,自动根据评论长度派生COVAL-COVALn。',
col='INDEXC函数从左到右搜索源字符串,在源字符串中找到的第一个出现在摘要中的任何字符,并返回该字符在源字符串中的位置。如果在源字符串中未找到摘要1到摘要n中的任何字符,INDEXC将返回值0。',
col1='在源字符串中未找到摘要1到摘要n中的任何字符,INDEXC将返回值0。REVERSE函数创建反向写入,使得参数中的最后一个字符成为结果中的第一个字符,参数中的倒数第二个字符成为结果中的第二个字符,依此类推。',
col2='下面的宏结合了INDEXC/REVERSE函数,自动根据评论长度派生COVAL-COVALn。')
英文:
split the text for a fixed length of 200 considering the delimiter, in this example consider the delimiter as space. here is the sample data , where i would like to derive col, col1, col2
based on
text
variable
df <- data.frame(id=1,text='The INDEXC function searches source from left to right, for the first occurrence of any character present in the
excerpts and returns the position in source of that character. If none of the characters in excerpt-1 through excerpt-n
in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the
last character in the argument becomes the first character in the result, the next-to-last character in the argument
becomes the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to
automatically derive COVAL-COVALn based on comment’s length.',
col='The INDEXC function searches source from left to right, for the first occurrence of any character present in the
excerpts and returns the position in source of that character. If none of the characters in excerpt-1 through excerpt-n ',
col1='in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the
last character in the argument becomes the first character in the result, the next-to-last character in the argument
becomes',
col2='the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to
automatically derive COVAL-COVALn based on comment’s length.')
答案1
得分: 1
你可以使用 data.table
和 stringi
。
library(data.table)
library(stringi)
df <- data.frame(id = 1, text = "INDEXC函数从左到右搜索源字符串,寻找第一个在摘录中出现的字符,并返回该字符在源字符串中的位置。如果在源字符串的摘录1到摘录n中找不到任何字符,INDEXC将返回值0。REVERSE函数创建反向写入,使参数中的最后一个字符成为结果中的第一个字符,倒数第二个字符成为结果中的第二个字符,依此类推。下面的宏结合了INDEXC/REVERSE函数,根据注释的长度自动生成COVAL-COVALn。")
setDT(df)
matches <- unlist(stri_match_all_regex(df$text, ".{1,220}(?=\\b)"))
cols <- sprintf("col%s", seq_along(matches))
df[, (cols) := as.list(matches)]
df
id
1: 1
text
1: INDEXC函数从左到右搜索源字符串,寻找第一个在摘录中出现的字符,并返回该字符在源字符串中的位置。如果在源字符串的摘录1到摘录n中找不到任何字符,INDEXC将返回值0。REVERSE函数创建反向写入,使参数中的最后一个字符成为结果中的第一个字符,倒数第二个字符成为结果中的第二个字符,依此类推。下面的宏结合了INDEXC/REVERSE函数,根据注释的长度自动生成COVAL-COVALn。
col1
1: INDEXC函数从左到右搜索源字符串,寻找第一个在摘录中出现的字符,并返回该字符在源字符串中的位置。如果在源字符串的摘录1到摘录n中找不到任何字符,INDEXC将返回值0。
col2
1: REVERSE函数创建反向写入,使参数中的最后一个字符成为结果中的第一个字符,倒数第二个字符成为结果中的第二个字符,依此类推。
col3
1: 下面的宏结合了INDEXC/REVERSE函数,根据注释的长度自动生成COVAL-COVALn。
英文:
You can use data.table
and stringi
library(data.table)
library(stringi)
df <- data.frame(id = 1, text = "The INDEXC function searches source from left to right, for the first occurrence of any character present in the excerpts and returns the position in source of that character. If none of the characters in excerpt-1 through excerpt-n in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the last character in the argument becomes the first character in the result, the next-to-last character in the argument becomes the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to automatically derive COVAL-COVALn based on comment's length.")
setDT(df)
matches <- unlist(stri_match_all_regex(df$text, ".{1,220}(?=\\b)"))
cols <- sprintf("col%s", seq_along(matches))
df[, (cols) := as.list(matches)]
df
id
1: 1
text
1: The INDEXC function searches source from left to right, for the first occurrence of any character present in the excerpts and returns the position in source of that character. If none of the characters in excerpt-1 through excerpt-n in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the last character in the argument becomes the first character in the result, the next-to-last character in the argument becomes the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to automatically derive COVAL-COVALn based on comment's length.
col1
1: The INDEXC function searches source from left to right, for the first occurrence of any character present in the excerpts and returns the position in source of that character. If none of the characters in excerpt-1
col2
1: through excerpt-n in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the last character in the argument becomes the first character in the result, the next-to-last
col3
1: character in the argument becomes the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to automatically derive COVAL-COVALn based on comment's length
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论