2023年2月10日 06:03:51go评论83阅读模式

英文:

Is there a SAS function to flag a word that repeats in order across columns?

问题

在序列中没有其他单词或中间缺失的情况下，是否有一种方法来标记包含单词'Add'的行？

我尝试使用数组语句和查找函数，但没有成功！

英文:

Is there a way to flag rows where the word 'Add' is in sequence without any other word or missing in between?

I tried the array statement with the find function, but no luck!

答案1

得分: 1

这段代码将查找所有包含至少两个连续的 Add 的序列，并将所有这些序列保存到一个以逗号分隔的单个变量中。

示例数据：

data have;
    input t1$ t2$ t3$ t4$ t5$ t6$ t7$ t8$ t9$ t10$;
    datalines;
Add Add No Add No Add . No Add .
Add No Add Add Add Add . . No .
Add Add Add No Add Add Add Add . .
;
run;

代码：

data want;
    set have;

    array t[*] t:;
    array col[10] $;
    length sequences $50.;

    /* 检查当前值和前一个值是否为 'Add' */
    do i = 1 to dim(t);
        if(i > 1 AND t[i] = 'Add' AND t[i-1] = 'Add') then do;
            col[i]   = vname(t[i]);
            col[i-1] = vname(t[i-1]);
        end;
    end;

    /* 为每个序列创建逗号分隔的列表。例如：
       t1-t3,t3-t5
       t1-t4
       等等
    */
    flag_start = 0;

    do i = 1 to dim(col);

        /* 找到序列的起始位置 */
        if(col[i] NE ' ' AND NOT flag_start) then do;
            seq_start  = col[i];
            flag_start = 1;
        end;

        /* 找到序列的结束位置 */
        if(col[i] = ' ' AND flag_start) then do;
            seq_end    = col[i-1];
            flag_start = 0;
        end;

        /* 如果我们在序列之间，计算序列范围并保存它 */
        if(i > 1 AND col[i] = ' ' AND col[i-1] NE ' ') then do;
            seq_range = cats(seq_start, '-', seq_end);
            sequences = catx(',', sequences, seq_range);
        end;
    end;

    drop i flag_start seq_start seq_end seq_range col:;
run;

输出：

t1	t2	t3	t4	t5	t6	t7	t8	t9	t10	sequences
Add	Add	No	Add	No	Add		No	Add		t1-t2
Add	No	Add	Add	Add	Add			No		t3-t6
Add	Add	Add	No	Add	Add	Add	Add			t1-t3,t5-t8

英文:

This code will find all sequences of Add where there are at least two Adds in a row and save all of the sequences to a single comma-separated variable.

Sample data:

data have;
    input t1$ t2$ t3$ t4$ t5$ t6$ t7$ t8$ t9$ t10$;
    datalines;
Add Add No Add No Add . No Add .
Add No Add Add Add Add . . No .
Add Add Add No Add Add Add Add . .
;
run;

Code:

data want;
    set have;

    array t[*] t:;
    array col[10] $;
    length sequences $50.;

    /* Check if the current and previous value is &#39;Add&#39; */
    do i = 1 to dim(t);
        if(i &gt; 1 AND t[i] = &#39;Add&#39; AND t[i-1] = &#39;Add&#39;) then do;
            col[i]   = vname(t[i]);
            col[i-1] = vname(t[i-1]);
        end;
    end;

    /* Create a comma-separated list for each sequence. For example:
       t1-t3,t3-t5
       t1-t4
       etc.
    */
    flag_start = 0;

    do i = 1 to dim(col);
        
        /* Find the start of the sequence */
        if(col[i] NE &#39; &#39; AND NOT flag_start) then do;
            seq_start  = col[i];
            flag_start = 1;
        end;

        /* Find the end of the sequence */
        if(col[i] = &#39; &#39; AND flag_start) then do;
            seq_end    = col[i-1];
            flag_start = 0;
        end;

        /* If we are between sequences, calculate the sequence range and save it */
        if(i &gt; 1 AND col[i] = &#39; &#39; AND col[i-1] NE &#39; &#39;) then do;
            seq_range = cats(seq_start, &#39;-&#39;, seq_end);
            sequences = catx(&#39;,&#39;, sequences, seq_range);
        end;
    end;

    drop i flag_start seq_start seq_end seq_range col:;
run;

Output:

t1	t2	t3	t4	t5	t6	t7	t8	t9	t10	sequences
Add	Add	No	Add	No	Add		No	Add		t1-t2
Add	No	Add	Add	Add	Add			No		t3-t6
Add	Add	Add	No	Add	Add	Add	Add			t1-t3,t5-t8

答案2

得分: 1

以下是代码中需要翻译的部分：

"The presence of a target word at a T<index> column can be flagged using a binary value, setting the bits appropriately."

中文翻译：
在 T<index> 列上存在目标单词时，可以使用二进制值进行标记，设置位数相应地。

"Flag up to 32 columns. For more than 32 columns you would need additional flag variables and some extra bookkeeping when calculating the flag value."

中文翻译：
标记最多 32 列。如果超过 32 列，您需要额外的标记变量以及在计算标记值时的一些额外记录。

英文:

The presence of a target word at a T<index> column can be flagged using a binary value, setting the bits appropriately.

Example:

Flag up to 32 columns. For more than 32 columns you would need additional flag variables and some extra bookkeeping when calculating the flag value.

data have;
    input (t1-t10) ($);
    datalines;
Add Add No Add No Add . No Add .
Add No Add Add Add Add . . No .
Add Add Add No Add Add Add Add . .
;


data want;
  set have;
  array ts t1-t10;
  flag = 0;
  do over ts;
    flag = BOR (flag, BLSHIFT(ts=&#39;Add&#39;, _i_-1));
  end;

  format flag binary32.;
run;

答案3

得分: 0

Solution 1: 有效的序列从T1开始，直到有一个间断。

* 有效序列从t1开始，直到有一个间断;
data want1;
set have;
length sequence $20;
if t1 = 'Add' and t2 = 'Add';  * 如果T1或T2不等于'Add'，则继续下一条观测;

array t(*) t1-t10;  

do i = 3 to dim(t);  * 从t3开始循环，因为我们知道t1和t2都是'Add'；
    if t[i] ne 'Add' then do;
        sequence = cats('T1-T', put(i-1, 2.));
        output;
        leave;  * 退出循环，移至下一条观测;
    end;
end;
drop i;
run;

结果:

t1	t2	t3	t4	t5	t6	t7	t8	t9	t10	sequence
Add	Add	No	Add	No	Add		No	Add		T1-T2
Add	Add	Add	No	Add	Add	Add	Add			T1-T3

Solution 2: 下一个解决方案仍然检测从T1开始的有效序列，但允许间断和第一个序列之后的其他序列。

* 序列从t1开始，带有一个间断，同一行上还发生了另一个序列;
data want2;
set have;
length sequence $20;
if t1 = 'Add' and t2 = 'Add';  * 如果T1或T2不等于'Add'，则移至下一条观测;

array t(*) t1-t10;

seq_strt = 1;  * 序列的开始。从1开始，因为有子集的if条件；
break = 0;     * 用于标记序列中的间断。从0开始，因为有子集的if条件；
sequence = '';

do i = 3 to dim(t);  * 从t3开始循环，因为我们知道t1和t2都是'Add'；
    * 序列的开始 - 在间断期间出现2个'Add'；
    if break = 1 and t{i] = 'Add' and t[i-1] = 'Add' then do;  * 新序列的开始;
        break = 0;
        seq_strt = i-1;
    end;
    * 序列的结束;
    else if break = 0 and t[i] ne 'Add' then do;
        break = 1;  * 标记间断;
        sequence = catx(',', sequence, cats('T', put(seq_strt, 2.), '-T', put(i-1, 2.)));
    end;
end;
drop i seq_strt break;
run;

结果:

t1	t2	t3	t4	t5	t6	t7	t8	t9	t10	sequence
Add	Add	No	Add	No	Add		No	Add		T1-T2
Add	Add	Add	No	Add	Add	Add	Add			T1-T3,T5-T8

最后，最后一个解决方案检测任何时间段内的任何序列。

* 在任何时间段捕获任何序列;
data want3;
set have;
length sequence $20;

array t(*) t1-t10;

seq_strt = 0;  * 序列的开始;
break = 1;     * 用于标记序列中的间断。从间断开始，直到找到新序列为止;
sequence = '';

do i = 2 to dim(t);  * 从t2开始循环，以便与t1比较;
    * 序列的开始 - 在间断期间出现2个'Add'；
    if break = 1 and t{i] = 'Add' and t[i-1] = 'Add' then do;  * 新序列的开始;
        break = 0;
        seq_strt = i-1;
    end;
    * 序列的结束;
    else if break = 0 and t[i] ne 'Add' then do;
        break = 1;  * 标记间断;
        sequence = catx(',', sequence, cats('T', put(seq_strt, 2.), '-T', put(i-1, 2.)));
    end;
end;
drop i seq_strt break;
run;

结果:

t1	t2	t3	t4	t5	t6	t7	t8	t9	t10	sequence
Add	Add	No	Add	No	Add		No	Add		T1-T2
Add	No	Add	Add	Add	Add			No		T3-T6
Add	Add	Add	No	Add	Add	Add	Add			T1-T3,T5-T8

英文:

I have a few solutions depending on when and how many sequences are allowed.

First, a sequence is defined as 2 or more consecutive time periods with 'Add'. For my solutions I used Richard's sample data.

Solution 1: Valid sequences begin at T1 until a break

* valid sequence begins at t1 until a break;
data want1;
set have;
length sequence $20;
if t1 = &#39;Add&#39; and t2 = &#39;Add&#39;;  * if either T1 or T2 &lt;&gt; &#39;Add&#39; then move on to next obs;

array t(*) t1-t10;  

do i = 3 to dim(t);  * start loop at t3 since we know t1 &amp; t2 = &#39;Add&#39;;
    if t[i] ne &#39;Add&#39; then do;
        sequence = cats(&#39;T1-T&#39;, put(i-1, 2.));
        output;
        leave;  * exit loop. move to next obs;
    end;
end;
drop i;
run;

Result:

t1	t2	t3	t4	t5	t6	t7	t8	t9	t10	sequence
Add	Add	No	Add	No	Add		No	Add		T1-T2
Add	Add	Add	No	Add	Add	Add	Add			T1-T3

Solution 2: The next solution still detects valid sequences beginning at T1, but allows breaks and other sequences beyond the first one.

* sequence begins at t1 with a break and another sequence occurs on same row;
data want2;
set have;
length sequence $20;
if t1 = &#39;Add&#39; and t2 = &#39;Add&#39;;  * if either T1 or T2 &lt;&gt; &#39;Add&#39; then move to next obs;

array t(*) t1-t10;

seq_strt = 1;  * start of sequence. start at 1 because of subsetting if;
break = 0;     * flag for break in sequence. start at 0 because of subsetting if;
sequence = &#39;&#39;;

do i = 3 to dim(t);  * start loop at t3 since we know t1 &amp; t2 = &#39;Add&#39;;
    * start of sequence - 2 consecutive &#39;Add&#39; during break;
    if break = 1 and t{i] = &#39;Add&#39; and t[i-1] = &#39;Add&#39; then do;  * start of new sequence;
        break = 0;
        seq_strt = i-1;
    end;
    * end of sequence;
    else if break = 0 and t[i] ne &#39;Add&#39; then do;
        break = 1;  * flag a break;
        sequence = catx(&#39;,&#39;, sequence, cats(&#39;T&#39;, put(seq_strt, 2.), &#39;-T&#39;, put(i-1, 2.)));
    end;
end;
drop i seq_strt break;
run;

Result:

t1	t2	t3	t4	t5	t6	t7	t8	t9	t10	sequence
Add	Add	No	Add	No	Add		No	Add		T1-T2
Add	Add	Add	No	Add	Add	Add	Add			T1-T3,T5-T8

Finally, the last solution detects any sequence in any time period.

* capture any sequence at any period of time;
data want3;
set have;
length sequence $20;

array t(*) t1-t10;

seq_strt = 0;  * start of sequence;
break = 1;     * flag for break in sequence. start with break until new seq is found;
sequence = &#39;&#39;;

do i = 2 to dim(t);  * start loop at t2 to compare at t1;
    * start of sequence - 2 consecutive &#39;Add&#39; during break;
    if break = 1 and t{i] = &#39;Add&#39; and t[i-1] = &#39;Add&#39; then do;  * start of new sequence;
        break = 0;
        seq_strt = i-1;
    end;
    * end of sequence;
    else if break = 0 and t[i] ne &#39;Add&#39; then do;
        break = 1;  * flag a break;
        sequence = catx(&#39;,&#39;, sequence, cats(&#39;T&#39;, put(seq_strt, 2.), &#39;-T&#39;, put(i-1, 2.)));
    end;
end;
drop i seq_strt break;
run;

Result:

t1	t2	t3	t4	t5	t6	t7	t8	t9	t10	sequence
Add	Add	No	Add	No	Add		No	Add		T1-T2
Add	No	Add	Add	Add	Add			No		T3-T6
Add	Add	Add	No	Add	Add	Add	Add			T1-T3,T5-T8

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有SAS函数可以标识在多列中以顺序重复的单词？

问题

答案1

答案2

答案3

在R中用于apply()函数的函数

Go中函数和方法的区别是什么？

Python – Math Operatives (Class, Functions) 如何创建一个带有函数并结合数学的类

如何在Swift中将参数传递给异步执行的块？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论