英文:
Matching observations across two string variables
问题
以下是您提供的文本的翻译部分:
- GDP per capita
- democracy score
我有两个连续指标,它们是在国家水平上测量的:
- 人均 GDP
- 民主评分
I have two string variables that essentially use the same country coding system, such as AFG for Afghanistan. However, I only have 184 observations under the country
variable for the GDP data, yet 249 observations under the code
variable for the democracy_score data.
我有两个字符串变量,它们基本上使用相同的国家编码系统,比如 AFG 代表阿富汗。然而,对于 GDP 数据,我的 country
变量下只有 184 个观测值,而对于 democracy_score 数据,code
变量下有 249 个观测值。
I would like to match GDP and democracy score data for observations where the data for both continuous indicators are complete.
我想要匹配 GDP 和民主评分数据,以便在连续指标的数据都完整的情况下进行匹配。
And I would like to match it with the democracy score data from the third row for observations where the country code is the same, "AFG".
并且,我想要将其与第三行的民主评分数据匹配,对于那些国家代码相同的观测值,如 "AFG"。
And the correct data structure would be as follows for AFG:
对于 AFG,正确的数据结构应如下:
country gdp_adj democracy_score
"AFG" 2079.9219 "0.174"
Here is a data example:
以下是一个数据示例:
dataex country gdp_adj code democracy_score
output:
输出:
- Example generated by -dataex-. For more info, type help dataex
clear
input str3 country float gdp_adj str3 code str5 democracy_score
"AFG" 2079.9219 "ABW" "0.813"
"AGO" 6602.424 "ADO" "#N/A"
"ALB" 13655.665 "AFG" "0.174"
...
end
请注意,我已经去掉了代码部分,只返回了翻译的文本。如果您有任何其他问题或需要进一步的帮助,请随时告诉我。
英文:
I have two continuous indicators that are measured at the country-level:
- GDP per capita
- democracy score
I have two string variables that essentially use the same country coding system, such as AFG for Afghanistan. However, I only have 184 observations under the country
variable for the GDP data, yet 249 observations under the code
variable for the democracy_score data.
I would like to match GDP and democracy score data for observations where the data for both continuous indicators are complete. For instance, the data in the first row below is
"AFG" 2079.9219 "ABW" "0.813"
And I would like to match it with the democracy score data from the third row for observations where the country code is the same, "AFG".
"ALB" 13655.665 "AFG" "0.174"
And the correct data structure would be as follows for AFG:
country gdp_adj democracy_score
"AFG" 2079.9219 "0.174"
Here is a data example:
dataex country gdp_adj code democracy_score
output:
* Example generated by -dataex-. For more info, type help dataex
clear
input str3 country float gdp_adj str3 code str5 democracy_score
"AFG" 2079.9219 "ABW" "0.813"
"AGO" 6602.424 "ADO" "#N/A"
"ALB" 13655.665 "AFG" "0.174"
"ARE" 71782.16 "AIA" "#N/A"
"ARG" 22071.75 "ALB" "0.576"
"ARM" 14317.553 "ANT" "#N/A"
"ATG" 23035.66 "ARE" "0.232"
"AUS" 49379.09 "ARG" "0.632"
"AUT" 55806.44 "ARM" "0.496"
"AZE" 14442.04 "ASM" "#N/A"
"BDI" 729.6584 "ATG" "#N/A"
"BEL" 51977.18 "AUS" "0.861"
"BEN" 3156.439 "AUT" "0.852"
"BFA" 2110.0623 "AZE" "0.200"
"BGD" 5467.208 "BDI" "0.170"
"BGR" 23270.225 "BEL" "0.820"
"BHR" 49768.98 "BEN" "0.473"
"BHS" 35161.832 "BFA" "0.358"
"BIH" 14634.738 "BGD" "0.388"
"BLR" 19279.209 "BGR" "0.602"
"BLZ" 9028.552 "BHR" "0.190"
"BOL" 8528.749 "BHS" "0.688"
"BRA" 14685.128 "BIH" "0.399"
end
答案1
得分: 2
以下是代码部分的翻译:
You can do it by stacking and reshaping back to wide:
通过堆叠和重新调整为宽格式来实现:
destring democracy_score, replace ignore("#N/A")
将democracy_score转换为数值型,替换忽略"#N/A"
stack country gdp_adj code democracy_score , into(country outcome) clear
将country、gdp_adj、code和democracy_score堆叠,生成新的变量country和outcome,并清除原始数据
reshape wide outcome, i(country) j(_stack)
将outcome重新调整为宽格式,以i(country)和j(_stack)标识
rename (outcome1 outcome2) (gdp_adj democracy_score)
重命名变量名,将outcome1和outcome2分别重命名为gdp_adj和democracy_score
I converted the score from string to double under the assumption that you would want to do some analysis on it. If not, then you can tostring
it back.
我假设你想对分数进行一些分析,因此将其从字符串转换为数值型。如果不需要,可以使用tostring
将其转回字符串类型。
I also had to tweak the GDP storage to double to avoid some precision issues:
我还不得不将GDP存储类型调整为双精度以避免一些精度问题:
input str3 country double gdp_adj str3 code str5 democracy_score
将country设为字符串类型,gdp_adj设为双精度数值型,code设为字符串类型,democracy_score设为字符串类型。
英文:
You can do it by stacking and reshaping back to wide:
destring democracy_score, replace ignore("#N/A")
stack country gdp_adj code democracy_score , into(country outcome) clear
reshape wide outcome, i(country) j(_stack)
rename (outcome1 outcome2) (gdp_adj democracy_score)
I converted the score from string to double under the assumption that you would want to do some analysis on it. If not, then you can tostring
it back.
I also had to tweak the GDP storage to double to avoid some precision issues:
input str3 country double gdp_adj str3 code str5 democracy_score
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论