选择具有最大字符长度的行在 SAS 中。

huangapple go评论78阅读模式
英文:

Choosing the row with the maximum character length in sas

问题

我有以下数据集:

    dataseta:
    No. Name1 Name2                       Sales Inv Comp
    1   TC    Tribal Council Inc          100   100  0
    2.  TC    Tribal Council Limited INC  20    25   65

期望的输出是:

    datasetb:
    No. Name1 Name2                       Sales Inv Comp
    1   TC    Tribal Council Limited Inc  120   125  0

基本上,我需要选择具有列"Name2"中字符长度最长的行。我尝试了以下代码,但它没有起作用:

proc sql;
create table datasetb as select no,name1,name2,sum(sales),sum(inv),min(comp) from dataseta group by 1,2,3 having length(name2)=max(length(name2));quit;

如果我使用以下代码,它只部分解决了问题,并且我会得到重复的行:

proc sql;
create table datasetb as select no,name1,max(length(name2)),sum(sales),sum(inv),min(comp) from dataseta group by 1,2 having length(name2)=max(length(name2));quit;
英文:

I have the following dataset:

dataseta:
No. Name1 Name2                       Sales Inv Comp
1   TC    Tribal Council Inc          100   100  0
2.  TC    Tribal Council Limited INC  20    25   65

desired output:
    datasetb:
    No. Name1 Name2                       Sales Inv Comp
    1   TC    Tribal Council Limited Inc  120   125  0

Basically, I need to choose the row with the maximum length of characters for the column name2.
I tried the following, but it didn't work

proc sql;
create table datasetb as select no,name1,name2,sum(sales),sum(inv),min(comp) from dataseta group by 1,2,3 having length(name2)=max(length(name2));quit;

If I do the following code, it only partially resolves it, and I get duplicate rows

proc sql;
create table datasetb as select no,name1,max(length(name2)),sum(sales),sum(inv),min(comp) from dataseta group by 1,2 having length(name2)=max(length(name2));quit;

答案1

得分: 0

你似乎正在将两个独立的聚合计算结果合并。

假设:

  • no 是唯一的,以允许决定胜负的标准,而第一个(按 no 排序)最长的 name2 将与 name1 上的 costinvcomp 总额合并。

这个查询将涉及很多操作...

  • name1 内第一个最长的 name2,需要嵌套子查询来实现:

    • 确定最长的 name2,然后
    • 根据 no 选择第一个,如果有多个。
  • name1 上的总计

    • 这些总计将是一个子查询,用于生成所需的结果集。

示例(SQL)

data have;
length no 8 name1 $6 name2 $35 sales inv comp 8;
input 
no name1& name2&                      sales inv  comp; datalines;
1   TC    Tribal Council Inc          100   100  0    * name1=TC group
2   TC    Tribal Council Limited INC  20    25   65
3   TC    Tribal council co           0     0    0 
4   TC    The Tribal council Assoctn  10    10   10
7   LS    Longshore association       10    10   0    * name=LS group
8   LS    The Longshore Group, LLC    2     4    8
9   LS    The Longshore Group, llc    15    15   6
run;

proc sql;
  create table want as

  select 
    first_longest_name2.no,
    first_longest_name2.name1,
    first_longest_name2.name2,
    name1_totals.sales,
    name1_totals.inv,
    name1_totals.comp

  FROM

    (
      select 
        no, name1, name2 
      from 
        ( select 
            no, name1, name2 
          from have
          group by name1
          having length(name2) = max(length(name2))
        ) longest_name2s
      group by name1
      having no = min(no)
    ) as
    first_longest_name2

  LEFT JOIN 

    ( 
      select  
        name1,
        sum(sales) as sales,
        sum(inv) as inv,
        sum(comp) as comp
      from 
        have
      group by name1
    ) as
    name1_totals

  ON

    first_longest_name2.name1 = name1_totals.name1
;
quit;

示例(DATA Step)

按连续的 name1 分组行的方式处理数据,可以使用 DOW 循环技术来实现 - 即在其中使用 SET 语句的循环。

data want2;
  do until (last.name1);
    set have;
    by name1 notsorted;

    if length(name2) > longest then do;
      longest = length(name2);
      no_at_longest = no;
      name2_at_longest = name2;
    end;

    sales_sum = sum(sales_sum,sales);
    inv_sum   = sum(inv_sum,inv);
    comp_sum  = sum(comp_sum,comp);
  end;

  drop name2 no sales inv comp longest;

  rename 
    no_at_longest = no
    name2_at_longest = name2
    sales_sum = sales
    inv_sum = inv
    comp_sum = comp
  ;
run;
英文:

You appear to be joining the results of two separate aggregate computations.

Presuming:

no is unique so as to allow a tie breaker criteria and the first (per no) longest name2 is to be joined with the cost, inv, comp totals over name1.

The query will have lots going on...

  • 1st longest name2 within name1, nested subqueries are needed to:

    • Determine the longest name2, then
    • Select first one, according to no, if more than one.
  • totals over name1

    • The totals will be a sub-query that is joined to, for delivering the desired result set.

Example (SQL)

data have;
length no 8 name1 $6 name2 $35 sales inv comp 8;
input 
no name1& name2&                      sales inv  comp; datalines;
1   TC    Tribal Council Inc          100   100  0    * name1=TC group
2   TC    Tribal Council Limited INC  20    25   65
3   TC    Tribal council co           0     0    0 
4   TC    The Tribal council Assoctn  10    10   10
7   LS    Longshore association       10    10   0    * name=LS group
8   LS    The Longshore Group, LLC    2     4    8
9   LS    The Longshore Group, llc    15    15   6
run;

proc sql;
  create table want as

  select 
    first_longest_name2.no,
    first_longest_name2.name1,
    first_longest_name2.name2,
    name1_totals.sales,
    name1_totals.inv,
    name1_totals.comp

  FROM

    (
      select 
        no, name1, name2 
      from 
        ( select 
            no, name1, name2 
          from have
          group by name1
          having length(name2) = max(length(name2))
        ) longest_name2s
      group by name1
      having no = min(no)
    ) as
    first_longest_name2

  LEFT JOIN 

    ( 
      select  
        name1,
        sum(sales) as sales,
        sum(inv) as inv,
        sum(comp) as comp
      from 
        have
      group by name1
    ) as
    name1_totals

  ON

    first_longest_name2.name1 = name1_totals.name1
;
quit;

Example (DATA Step)

Processing the data in a serial manner, when name1 groups are contiguous rows, can be accomplished using a DOW loop technique -- that is a loop with a SET statement within it.

data want2;
  do until (last.name1);
    set have;
    by name1 notsorted;

    if length(name2) > longest then do;
      longest = length(name2);
      no_at_longest = no;
      name2_at_longest = name2;
    end;

    sales_sum = sum(sales_sum,sales);
    inv_sum   = sum(inv_sum,inv);
    comp_sum  = sum(comp_sum,comp);
  end;

  drop name2 no sales inv comp longest;

  rename 
    no_at_longest = no
    name2_at_longest = name2
    sales_sum = sales
    inv_sum = inv
    comp_sum = comp
  ;
run;

huangapple
  • 本文由 发表于 2020年1月4日 01:00:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582448.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定