英文:
Concatenate and sort at the same time
问题
我有一个相当简单的任务在Stata中遇到了困难。
我有三个变量 SicTwo1 SicTwo2 SicTwo3
,它们都是数字(例如:"12"、"25" 和 "16")。
现在我想要将它们连接成一个新变量 SicAndSicAndSic
,但它们应该按从最低到最高的值排序(例如:"121625"),最好还带有分隔符(例如:"12&16&25")。
我尝试了这段代码:
gen NumSics = 0
replace NumSics = NumSics + 1 if !missing(SicTwo1)
replace NumSics = NumSics + 1 if !missing(SicTwo2) & SicTwo1 != SicTwo2
replace NumSics = NumSics + 1 if !missing(SicTwo3) & SicTwo1 != SicTwo2 & SicTwo1 != SicTwo3 & SicTwo2 != SicTwo3
sort SicTwo1 SicTwo2 SicTwo3
gen SicAndSic1 = string(SicTwo1)
gen SicAndSic2 = string(SicTwo1) + "&" + string(SicTwo2)
gen SicAndSic3 = string(SicTwo1) + "&" + string(SicTwo2) + "&" + string(SicTwo3)
gen SicAndSic = ""
replace SicAndSic = SicAndSic1 if NumSics == 1
replace SicAndSic = SicAndSic2 if NumSics == 2
replace SicAndSic = SicAndSic3 if NumSics == 3
但它并没有对变量进行排序,只是将它们放在一起。
英文:
I have a rather simple task which I am struggling with in Stata.
I have three variables SicTwo1 SicTwo2 SicTwo3
, which are numeric (e.g. "12", "25", and "16")
I now want to concatenate them into a new variable SicAndSicAndSic
, BUT they shall be ordered from lowest to highest value (e.g. "121625"), ideally with a separator (e.g. "12&16&25")
I tried this code:
gen NumSics = 0
replace NumSics = NumSics + 1 if !missing(SicTwo1)
replace NumSics = NumSics + 1 if !missing(SicTwo2) & SicTwo1 != SicTwo2
replace NumSics = NumSics + 1 if !missing(SicTwo3) & SicTwo1 != SicTwo2 & SicTwo1 != SicTwo3 & SicTwo2 != SicTwo3
sort SicTwo1 SicTwo2 SicTwo3
gen SicAndSic1 = string(SicTwo1)
gen SicAndSic2 = string(SicTwo1) + "&" + string(SicTwo2)
gen SicAndSic3 = string(SicTwo1) + "&" + string(SicTwo2) + "&" + string(SicTwo3)
gen SicAndSic = ""
replace SicAndSic = SicAndSic1 if NumSics == 1
replace SicAndSic = SicAndSic2 if NumSics == 2
replace SicAndSic = SicAndSic3 if NumSics == 3
But it does not sort the variables, and just puts them next to each other.
答案1
得分: 0
请查看https://www.stata-journal.com/article.html?article=pr0046,了解在观察(行)内对变量进行排序的一种方法。
clear
input SicTwo1 SicTwo2 SicTwo3
12 16 25
23 12 11
99 88 11
end
rowsort SicTwo?, gen(S1 S2 S3)
egen wanted = concat(S?) , p(&)
list
+-------------------------------------------------------+
| SicTwo1 SicTwo2 SicTwo3 S1 S2 S3 wanted |
|-------------------------------------------------------|
1. | 12 16 25 12 16 25 12&16&25 |
2. | 23 12 11 11 12 23 11&12&23 |
3. | 99 88 11 11 88 99 11&88&99 |
+-------------------------------------------------------+
您的代码显示了对sort
的误解,sort
按变量的值对观察进行排序,但绝对不会对观察内部进行排序,而这正是rowsort
所做的,原始变量不会改变,结果会存储在新变量中。
您的变量被说明为数值型,因此默认情况下,任何缺失值都会被排序为高值。如果您想要其他结果,您需要明确说明。
英文:
See https://www.stata-journal.com/article.html?article=pr0046 for one way to sort variables within observations (rows).
clear
input SicTwo1 SicTwo2 SicTwo3
12 16 25
23 12 11
99 88 11
end
rowsort SicTwo?, gen(S1 S2 S3)
egen wanted = concat(S?) , p(&)
list
+-------------------------------------------------------+
| SicTwo1 SicTwo2 SicTwo3 S1 S2 S3 wanted |
|-------------------------------------------------------|
1. | 12 16 25 12 16 25 12&16&25 |
2. | 23 12 11 11 12 23 11&12&23 |
3. | 99 88 11 11 88 99 11&88&99 |
+-------------------------------------------------------+
Your code shows a misunderstanding of sort
, which sorts observations by values of variables, but emphatically does not sort within observations -- which is precisely what does rowsort
does, with the proviso that the original variables are unchanged, and the results go in new variables.
Your variables are stated to be numeric, so any missing values will by default be sorted to high. If you want something else, you need to spell out what that is.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论