如何从SPSS文件中的数据框中使用标签替换值?

huangapple go评论132阅读模式
英文:

How to replace values by labels in data.frames from spss files?

问题

我必须读取一个.sav文件
我使用了haven

  1. library(haven)
  2. dataset <- read_sav("datafile.sav")

在控制台中,我可以看到标签:

如何从SPSS文件中的数据框中使用标签替换值?

  1. dput(head(voyages$portdep))
  2. structure(c(50422, 50299, 50299, 50299, NA, NA), label = "Port of departure", labels = c(Alicante = 10101,
  3. Barcelona = 10102, Bilbao = 10103, Cadiz = 10104, Figuera = 10105,
  4. Gibraltar = 10106, `La Coruña` = 10107, Santander = 10110, Seville = 10111,
  5. `San Lucar` = 10112, Vigo = 10113, `Spain, port unspecified` = 10199,
  6. Lagos = 10202, Lisbon = 10203, Oporto = 10204, `Ilho do Fayal` = 10205,
  7. Setubal = 10206, `Portugal, port unspecified` = 10299, `Great Britain, port unspecified` = 10399,
  8. Barmouth = 10401, Bideford = 10402, Birkenhead = 10403, Bristol = 10404,
  9. Brixham = 10405, Broadstairs = 10406, Cawsand = 10407, Chepstow = 10408,
  10. Chester = 10409, Colchester = 10410, Cowes = 10411, Dartmouth = 10412,
  11. Deptford = 10413, Dover = 10414, Exeter = 10415, Folkstone = 10416,
  12. Frodsham = 10417, Gainsborough = 10418, Greenwich = 10419, Guernsey = 10420,
  13. Harwich = 10421, Hull = 10422, Ilfracombe = 10423, Ipswich = 10424,
  14. `Isle of Man` = 10425, `Isle of Wight` = 10426, Jersey = 10427,
  15. Kendal = 10428, `King's Lynn` = 10429, Lancaster = 10430, Lindale = 10431,
  16. Liverpool = 10432, London = 10433, Lyme = 10434, Maryport = 10436,
  17. `Milford Haven` = 10437, `New Shoreham` = 10438, `Newcastle upon Tyne` = 10439,
  18. Newnham = 10440, `North Shields` = 10441, Norwich = 10443, Padstowe = 10444,
  19. Parkgate = 10445, `Piel of Foulney` = 10446, Plymouth = 10447,
  20. Poole = 10448, Portsery = 10449, Portsmouth = 10450, Poulton = 10451,
  21. Preston = 10452, Ramsgate = 10453, Ravenglass = 10454, `River Thames` = 10455,
  22. Rochester = 10456, Rotherhithe = 10457, Rye = 10458, Scarborough = 10459,
  23. Sheerness = 10460, Shields = 10461, Shoreham = 10462, Sidmouth = 10463,
  24. Southampton = 10464, Stockton = 10466, Stockwithe = 10467, Sunderland = 10468,
  25. Teignmouth = 10469, Topsham = 10470, Torbay = 10471, Wales = 10472,
  26. )

在HTML表格中,我只有值:

如何从SPSS文件中的数据框中使用标签替换值?

如何在来自SPSS文件的数据框中用标签替换值以在HTML表格中显示?

使用sjlabelled包,我可以获取任何列的标签:

  1. library(sjlabelled)
  2. get_labels(voyages$portdep)

1] "Alicante" "Barcelona" "Bilbao" "Cadiz"
[5] "Figuera" "Gibraltar" "La Coruña" "Santander"
[9] "Seville" "San Lucar" "Vigo" "Spain, port unspecified"
[13] "Lagos" "Lisbon" "Oporto" "Ilho do Fayal"
[17] "Setubal" "Portugal, port unspecified" "Great Britain, port unspecified" "Barmouth"
[21] "Bideford" "Birkenhead" "Bristol" "Brixham"
[25] "Broadstairs" "Cawsand" "Chepstow" "Chester"
[29] "Colchester" "Cowes" "Dartmouth" "Deptford"
[33] "Dover" "Exeter" "Folkstone" "Frodsham"
[37] "Gainsborough" "Greenwich" "Guernsey" "Harwich"
[41] "Hull" "Ilfracombe" "Ipswich" "Isle of Man"
[45] "Isle of Wight" "Jersey" "Kendal" "King's Lynn"

我尝试过:

对于单个列:

  1. dataset2 <- dataset %>% mutate(portdep = get_labels(portdep))

错误:列 portdep 必须是长度为36002(行数)或一个,而不是847

对于整个数据框:

  1. dataset2 <- dataset %>% mutate_all(funs(get_labels(.)))

在第一列上出现相同的错误:

列 xxx 必须是长度为36002(行数)或一个,而不是2

英文:

I have to read a sav file
I use the package haven

  1. library(haven)
  2. dataset&lt;- read_sav(&quot;datafile.sav&quot;)

In the console I can see the labels :

如何从SPSS文件中的数据框中使用标签替换值?

  1. dput(head(voyages$portdep))
  2. structure(c(50422, 50299, 50299, 50299, NA, NA), label = &quot;Port of departure&quot;, labels = c(Alicante = 10101,
  3. Barcelona = 10102, Bilbao = 10103, Cadiz = 10104, Figuera = 10105,
  4. Gibraltar = 10106, `La Coru&#241;a` = 10107, Santander = 10110, Seville = 10111,
  5. `San Lucar` = 10112, Vigo = 10113, `Spain, port unspecified` = 10199,
  6. Lagos = 10202, Lisbon = 10203, Oporto = 10204, `Ilho do Fayal` = 10205,
  7. Setubal = 10206, `Portugal, port unspecified` = 10299, `Great Britain, port unspecified` = 10399,
  8. Barmouth = 10401, Bideford = 10402, Birkenhead = 10403, Bristol = 10404,
  9. Brixham = 10405, Broadstairs = 10406, Cawsand = 10407, Chepstow = 10408,
  10. Chester = 10409, Colchester = 10410, Cowes = 10411, Dartmouth = 10412,
  11. Deptford = 10413, Dover = 10414, Exeter = 10415, Folkstone = 10416,
  12. Frodsham = 10417, Gainsborough = 10418, Greenwich = 10419, Guernsey = 10420,
  13. Harwich = 10421, Hull = 10422, Ilfracombe = 10423, Ipswich = 10424,
  14. `Isle of Man` = 10425, `Isle of Wight` = 10426, Jersey = 10427,
  15. Kendal = 10428, `King&#39;s Lynn` = 10429, Lancaster = 10430, Lindale = 10431,
  16. Liverpool = 10432, London = 10433, Lyme = 10434, Maryport = 10436,
  17. `Milford Haven` = 10437, `New Shoreham` = 10438, `Newcastle upon Tyne` = 10439,
  18. Newnham = 10440, `North Shields` = 10441, Norwich = 10443, Padstowe = 10444,
  19. Parkgate = 10445, `Piel of Foulney` = 10446, Plymouth = 10447,
  20. Poole = 10448, Portsery = 10449, Portsmouth = 10450, Poulton = 10451,
  21. Preston = 10452, Ramsgate = 10453, Ravenglass = 10454, `River Thames` = 10455,
  22. Rochester = 10456, Rotherhithe = 10457, Rye = 10458, Scarborough = 10459,
  23. Sheerness = 10460, Shields = 10461, Shoreham = 10462, Sidmouth = 10463,
  24. Southampton = 10464, Stockton = 10466, Stockwithe = 10467, Sunderland = 10468,
  25. Teignmouth = 10469, Topsham = 10470, Torbay = 10471, Wales = 10472,

In html table, I have only the values :

如何从SPSS文件中的数据框中使用标签替换值?

How to replace values by labels in data.frames from spss files ?for displaying in html table ?

using sjlabelled package, I can get labels of any column :

  1. library(sjlabelled)
  2. get_labels(voyages$portdep)

1] "Alicante" "Barcelona" "Bilbao" "Cadiz"
[5] "Figuera" "Gibraltar" "La Coruña" "Santander"
[9] "Seville" "San Lucar" "Vigo" "Spain, port unspecified"
[13] "Lagos" "Lisbon" "Oporto" "Ilho do Fayal"
[17] "Setubal" "Portugal, port unspecified" "Great Britain, port unspecified" "Barmouth"
[21] "Bideford" "Birkenhead" "Bristol" "Brixham"
[25] "Broadstairs" "Cawsand" "Chepstow" "Chester"
[29] "Colchester" "Cowes" "Dartmouth" "Deptford"
[33] "Dover" "Exeter" "Folkstone" "Frodsham"
[37] "Gainsborough" "Greenwich" "Guernsey" "Harwich"
[41] "Hull" "Ilfracombe" "Ipswich" "Isle of Man"
[45] "Isle of Wight" "Jersey" "Kendal" "King's Lynn"

I tried :

On a single column :

  1. dataset2 &lt;- dataset %&gt;% mutate(portdep = get_labels(portdep))

> Erreur : Column portdep must be length 36002 (the number of rows) or
> one, not 847

On all the dataframe :

  1. dataset2 &lt;- dataset %&gt;% mutate_all(funs(get_labels(.)))

With the same error on first column :

> Column xxx must be length 36002 (the number of rows) or one, not 2

答案1

得分: 5

我认为你可以使用 haven::as_factor 来获取你正在寻找的内容。

这个工作吗?

  1. library(haven)
  2. library(dplyr)
  3. dataset %>%
  4. mutate_all(as_factor) %>%
  5. head() %>%
  6. View()
英文:

I think you can get what you're looking for by using haven::as_factor.

Does this work?

  1. library(haven)
  2. library(dplyr)
  3. dataset %&gt;%
  4. mutate_all(as_factor) %&gt;%
  5. head() %&gt;%
  6. View()

答案2

得分: 2

不使用 haven 包,你可以尝试使用 foreign。我使用了自己的数据 try.sav,其中包括一个名为 gender 的变量:

  1. library(haven)
  2. df_haven<- read_sav("try.sav")
  3. class(df_haven$gender)
  4. #> [1] "haven_labelled"
  5. table(df_haven$gender)
  6. #>
  7. #> 1 2
  8. #> 1972 2417
  9. df_haven$gender
  10. #> <Labelled double>: Gender
  11. #> [1] 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2
  12. #> [38] 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  13. #> [75] 2 2 1 1 2 2 1 2 1 2 1 1 2 1 2 1 1 2 2 2 2 2 1 1 2 2 1 2 1 2 2 2 1 1 2 2 1
  14. #> ...
  15. #> Labels:
  16. #> value label
  17. #> 1 male
  18. #> 2 female
  19. library(foreign)
  20. df_foreign<- read.spss("try.sav", to.data.frame = TRUE)
  21. #> re-encoding from UTF-8
  22. class(df_foreign$gender)
  23. #> [1] "factor"
  24. table(df_foreign$gender)
  25. #>
  26. #> male female
  27. #> 1972 2417
  28. df_foreign$gender
  29. #> [1] female female female male female female female female female female
  30. #> [11] female female female male female female female female female male
  31. #> [21] female female female female female female male male male male
  32. #> [31] female female female female female female female female female female
  33. #> [41] male female female male female female female female female female
  34. #> [51] female female male female female female female male female female
  35. #> [61] female female female female female female female female female female
  36. #> [71] female female female female female female male male female female
  37. #> [81] male female male female male male female male female male
  38. #> [91] male female female female female female male male female female
  39. #> ...
  40. #> Levels: male female

2020-01-06 由 reprex package (v0.3.0) 创建

英文:

Instead of using haven package, you could try foreign. I used my own data try.sav including a variable gender:

  1. library(haven)
  2. df_haven&lt;- read_sav(&quot;try.sav&quot;)
  3. class(df_haven$gender)
  4. #&gt; [1] &quot;haven_labelled&quot;
  5. table(df_haven$gender)
  6. #&gt;
  7. #&gt; 1 2
  8. #&gt; 1972 2417
  9. df_haven$gender
  10. #&gt; &lt;Labelled double&gt;: Gender
  11. #&gt; [1] 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2
  12. #&gt; [38] 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
  13. #&gt; [75] 2 2 1 1 2 2 1 2 1 2 1 1 2 1 2 1 1 2 2 2 2 2 1 1 2 2 1 2 1 2 2 2 1 1 2 2 1
  14. #&gt; ...
  15. #&gt; Labels:
  16. #&gt; value label
  17. #&gt; 1 male
  18. #&gt; 2 female
  19. library(foreign)
  20. df_foreign&lt;- read.spss(&quot;try.sav&quot;, to.data.frame = TRUE)
  21. #&gt; re-encoding from UTF-8
  22. class(df_foreign$gender)
  23. #&gt; [1] &quot;factor&quot;
  24. table(df_foreign$gender)
  25. #&gt;
  26. #&gt; male female
  27. #&gt; 1972 2417
  28. df_foreign$gender
  29. #&gt; [1] female female female male female female female female female female
  30. #&gt; [11] female female female male female female female female female male
  31. #&gt; [21] female female female female female female male male male male
  32. #&gt; [31] female female female female female female female female female female
  33. #&gt; [41] male female female male female female female female female female
  34. #&gt; [51] female female male female female female female male female female
  35. #&gt; [61] female female female female female female female female female female
  36. #&gt; [71] female female female female female female male male female female
  37. #&gt; [81] male female male female male male female male female male
  38. #&gt; [91] male female female female female female male male female female
  39. ....
  40. #&gt; Levels: male female

<sup>Created on 2020-01-06 by the reprex package (v0.3.0)</sup>

答案3

得分: 1

你也可以使用 haven 包中的 as_factor() 函数。

  1. library(haven)
  2. as_factor(df_foreign$gender)
英文:

You could also use as_factor() from haven package.

  1. library(haven)
  2. as_factor(df_foreign$gender)

That should work! Good luck!

huangapple
  • 本文由 发表于 2020年1月6日 17:39:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/59609733.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定