分割文本以固定长度,考虑分隔符。

huangapple go评论71阅读模式
英文:

split the text for a fixed length considering the delimiter

问题

将文本按固定长度200分割,考虑分隔符,在此示例中将分隔符视为空格。以下是示例数据,我想基于text变量派生col, col1, col2

df <- data.frame(id=1,text='INDEXC函数从左到右搜索源字符串,在源字符串中找到的第一个出现在摘要中的任何字符,并返回该字符在源字符串中的位置。如果在源字符串中未找到摘要1到摘要n中的任何字符,INDEXC将返回值0。REVERSE函数创建反向写入,使得参数中的最后一个字符成为结果中的第一个字符,参数中的倒数第二个字符成为结果中的第二个字符,依此类推。下面的宏结合了INDEXC/REVERSE函数,自动根据评论长度派生COVAL-COVALn。',
                 col='INDEXC函数从左到右搜索源字符串,在源字符串中找到的第一个出现在摘要中的任何字符,并返回该字符在源字符串中的位置。如果在源字符串中未找到摘要1到摘要n中的任何字符,INDEXC将返回值0。',
                 col1='在源字符串中未找到摘要1到摘要n中的任何字符,INDEXC将返回值0。REVERSE函数创建反向写入,使得参数中的最后一个字符成为结果中的第一个字符,参数中的倒数第二个字符成为结果中的第二个字符,依此类推。',
col2='下面的宏结合了INDEXC/REVERSE函数,自动根据评论长度派生COVAL-COVALn。')
英文:

split the text for a fixed length of 200 considering the delimiter, in this example consider the delimiter as space. here is the sample data , where i would like to derive col, col1, col2 based on
text variable

df <- data.frame(id=1,text='The INDEXC function searches source from left to right, for the first occurrence of any character present in the 
excerpts and returns the position in source of that character. If none of the characters in excerpt-1 through excerpt-n 
in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the 
last character in the argument becomes the first character in the result, the next-to-last character in the argument 
becomes the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to
automatically derive COVAL-COVALn based on comment’s length.',
                 col='The INDEXC function searches source from left to right, for the first occurrence of any character present in the 
excerpts and returns the position in source of that character. If none of the characters in excerpt-1 through excerpt-n ',
                 col1='in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the 
last character in the argument becomes the first character in the result, the next-to-last character in the argument 
becomes',
col2='the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to
automatically derive COVAL-COVALn based on comment’s length.')

答案1

得分: 1

你可以使用 data.tablestringi

library(data.table)
library(stringi)

df <- data.frame(id = 1, text = "INDEXC函数从左到右搜索源字符串,寻找第一个在摘录中出现的字符,并返回该字符在源字符串中的位置。如果在源字符串的摘录1到摘录n中找不到任何字符,INDEXC将返回值0。REVERSE函数创建反向写入,使参数中的最后一个字符成为结果中的第一个字符,倒数第二个字符成为结果中的第二个字符,依此类推。下面的宏结合了INDEXC/REVERSE函数,根据注释的长度自动生成COVAL-COVALn。")
setDT(df)
matches <- unlist(stri_match_all_regex(df$text, ".{1,220}(?=\\b)"))
cols <- sprintf("col%s", seq_along(matches))
df[, (cols) := as.list(matches)]
df

  id
1:  1
                                                                                                                                                                                                                                                                                                                                                             text
1: INDEXC函数从左到右搜索源字符串,寻找第一个在摘录中出现的字符,并返回该字符在源字符串中的位置。如果在源字符串的摘录1到摘录n中找不到任何字符,INDEXC将返回值0。REVERSE函数创建反向写入,使参数中的最后一个字符成为结果中的第一个字符,倒数第二个字符成为结果中的第二个字符,依此类推。下面的宏结合了INDEXC/REVERSE函数,根据注释的长度自动生成COVAL-COVALn。
                                                                                                                                                                                                                      col1
1: INDEXC函数从左到右搜索源字符串,寻找第一个在摘录中出现的字符,并返回该字符在源字符串中的位置。如果在源字符串的摘录1到摘录n中找不到任何字符,INDEXC将返回值0。
                                                                                                                                                                                                                           col2
1: REVERSE函数创建反向写入,使参数中的最后一个字符成为结果中的第一个字符,倒数第二个字符成为结果中的第二个字符,依此类推。
                                                                                                                                                                                           col3
1: 下面的宏结合了INDEXC/REVERSE函数,根据注释的长度自动生成COVAL-COVALn。
英文:

You can use data.table and stringi

library(data.table)
library(stringi)

df &lt;- data.frame(id = 1, text = &quot;The INDEXC function searches source from left to right, for the first occurrence of any character present in the excerpts and returns the position in source of that character. If none of the characters in excerpt-1 through excerpt-n in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the last character in the argument becomes the first character in the result, the next-to-last character in the argument becomes the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to automatically derive COVAL-COVALn based on comment&#39;s length.&quot;)
setDT(df)
matches &lt;- unlist(stri_match_all_regex(df$text, &quot;.{1,220}(?=\\b)&quot;))
cols &lt;- sprintf(&quot;col%s&quot;, seq_along(matches))
df[, (cols) := as.list(matches)]
df

  id
1:  1
text
1: The INDEXC function searches source from left to right, for the first occurrence of any character present in the excerpts and returns the position in source of that character. If none of the characters in excerpt-1 through excerpt-n in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the last character in the argument becomes the first character in the result, the next-to-last character in the argument becomes the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to automatically derive COVAL-COVALn based on comment&#39;s length.
                                                                                                                                                                                                                      col1
1: The INDEXC function searches source from left to right, for the first occurrence of any character present in the excerpts and returns the position in source of that character. If none of the characters in excerpt-1 
                                                                                                                                                                                                                           col2
1: through excerpt-n in source are found, INDEXC returns a value of 0. The REVERSE function creates backward writing, such that the last character in the argument becomes the first character in the result, the next-to-last 
                                                                                                                                                                                           col3
1: character in the argument becomes the second character in the result, and so on. Below macro combines INDEXC/REVERSE function to automatically derive COVAL-COVALn based on comment&#39;s length

huangapple
  • 本文由 发表于 2023年7月4日 23:58:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76614284.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定