匹配两个字符串变量中的观测值

huangapple go评论59阅读模式
英文:

Matching observations across two string variables

问题

以下是您提供的文本的翻译部分:

  1. GDP per capita
  2. democracy score

我有两个连续指标,它们是在国家水平上测量的:

  1. 人均 GDP
  2. 民主评分

I have two string variables that essentially use the same country coding system, such as AFG for Afghanistan. However, I only have 184 observations under the country variable for the GDP data, yet 249 observations under the code variable for the democracy_score data.

我有两个字符串变量,它们基本上使用相同的国家编码系统,比如 AFG 代表阿富汗。然而,对于 GDP 数据,我的 country 变量下只有 184 个观测值,而对于 democracy_score 数据,code 变量下有 249 个观测值。

I would like to match GDP and democracy score data for observations where the data for both continuous indicators are complete.

我想要匹配 GDP 和民主评分数据,以便在连续指标的数据都完整的情况下进行匹配。

And I would like to match it with the democracy score data from the third row for observations where the country code is the same, "AFG".

并且,我想要将其与第三行的民主评分数据匹配,对于那些国家代码相同的观测值,如 "AFG"。

And the correct data structure would be as follows for AFG:

对于 AFG,正确的数据结构应如下:

country gdp_adj democracy_score
"AFG" 2079.9219 "0.174"

Here is a data example:

以下是一个数据示例:

dataex country gdp_adj code democracy_score

output:

输出:

  • Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 country float gdp_adj str3 code str5 democracy_score
    "AFG" 2079.9219 "ABW" "0.813"
    "AGO" 6602.424 "ADO" "#N/A"
    "ALB" 13655.665 "AFG" "0.174"
    ...
    end

请注意,我已经去掉了代码部分,只返回了翻译的文本。如果您有任何其他问题或需要进一步的帮助,请随时告诉我。

英文:

I have two continuous indicators that are measured at the country-level:

  1. GDP per capita
  2. democracy score

I have two string variables that essentially use the same country coding system, such as AFG for Afghanistan. However, I only have 184 observations under the country variable for the GDP data, yet 249 observations under the code variable for the democracy_score data.

I would like to match GDP and democracy score data for observations where the data for both continuous indicators are complete. For instance, the data in the first row below is

"AFG" 2079.9219 "ABW" "0.813"

And I would like to match it with the democracy score data from the third row for observations where the country code is the same, "AFG".

"ALB" 13655.665 "AFG" "0.174"

And the correct data structure would be as follows for AFG:

country gdp_adj democracy_score 
"AFG" 2079.9219 "0.174"

Here is a data example:

dataex country gdp_adj code democracy_score 

output:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 country float gdp_adj str3 code str5 democracy_score
"AFG" 2079.9219 "ABW" "0.813"
"AGO"  6602.424 "ADO" "#N/A" 
"ALB" 13655.665 "AFG" "0.174"
"ARE"  71782.16 "AIA" "#N/A" 
"ARG"  22071.75 "ALB" "0.576"
"ARM" 14317.553 "ANT" "#N/A" 
"ATG"  23035.66 "ARE" "0.232"
"AUS"  49379.09 "ARG" "0.632"
"AUT"  55806.44 "ARM" "0.496"
"AZE"  14442.04 "ASM" "#N/A" 
"BDI"  729.6584 "ATG" "#N/A" 
"BEL"  51977.18 "AUS" "0.861"
"BEN"  3156.439 "AUT" "0.852"
"BFA" 2110.0623 "AZE" "0.200"
"BGD"  5467.208 "BDI" "0.170"
"BGR" 23270.225 "BEL" "0.820"
"BHR"  49768.98 "BEN" "0.473"
"BHS" 35161.832 "BFA" "0.358"
"BIH" 14634.738 "BGD" "0.388"
"BLR" 19279.209 "BGR" "0.602"
"BLZ"  9028.552 "BHR" "0.190"
"BOL"  8528.749 "BHS" "0.688"
"BRA" 14685.128 "BIH" "0.399"
end

答案1

得分: 2

以下是代码部分的翻译:

You can do it by stacking and reshaping back to wide:

通过堆叠和重新调整为宽格式来实现:

destring democracy_score, replace ignore("#N/A")

将democracy_score转换为数值型,替换忽略"#N/A"

stack country gdp_adj code democracy_score , into(country outcome) clear

将country、gdp_adj、code和democracy_score堆叠,生成新的变量country和outcome,并清除原始数据

reshape wide outcome, i(country) j(_stack)

将outcome重新调整为宽格式,以i(country)和j(_stack)标识

rename (outcome1 outcome2) (gdp_adj democracy_score)

重命名变量名,将outcome1和outcome2分别重命名为gdp_adj和democracy_score

I converted the score from string to double under the assumption that you would want to do some analysis on it. If not, then you can tostring it back.

我假设你想对分数进行一些分析,因此将其从字符串转换为数值型。如果不需要,可以使用tostring将其转回字符串类型。

I also had to tweak the GDP storage to double to avoid some precision issues:

我还不得不将GDP存储类型调整为双精度以避免一些精度问题:

input str3 country double gdp_adj str3 code str5 democracy_score

将country设为字符串类型,gdp_adj设为双精度数值型,code设为字符串类型,democracy_score设为字符串类型。

英文:

You can do it by stacking and reshaping back to wide:

destring democracy_score, replace ignore("#N/A")
stack country gdp_adj code democracy_score , into(country outcome) clear
reshape wide outcome, i(country) j(_stack)
rename (outcome1 outcome2) (gdp_adj democracy_score)

I converted the score from string to double under the assumption that you would want to do some analysis on it. If not, then you can tostring it back.

I also had to tweak the GDP storage to double to avoid some precision issues:

input str3 country double gdp_adj str3 code str5 democracy_score

huangapple
  • 本文由 发表于 2023年2月14日 06:58:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75441967.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定