2023年3月31日 22:56:10go评论76阅读模式

英文:

Extracting values after a split to create a new column with a yes or no in Python

问题

sampleID	comorbidities	hypertension	diabetes	CHD	asthma
P01	hypertension, diabetes	yes	yes	no	no
P02	hypertension, diabetes	yes	yes	no	no
P03	diabetes	no	yes	no	no
P04	CHD, asthma	no	no	yes	yes
P05	asthma, hypertension	yes	no	no	yes

英文:

sampleID	comorbidities
P01	hypertension, diabetes
P02	hypertension, diabetes
P03	diabetes
P04	CHD, asthma
P05	asthma, hypertension

Hello, I am new to coding and am currently working on some data cleaning using Python and I am trying to break apart my data so that I can perform some better analysis. I currently have a few columns that contain multiple strings within one column. For example, one column is the comorbidities of a patient and some patients have multiple comorbidities within that one column. I am trying to split the data, which are strings, so that there is a new column with a simple yes/no or 1/0 for each patient. I am unable to post pictures so I recreated the tables.

Currently I have one column that has multiple strings contained within it. I split the column using:
df1 = pd.concat((df, df['comorbidities'].str.split(',', expand = True)), axis = 1, ignore_index = True)

The resulting dataframe looks like this:

0	1	2	3
P01	hypertension, diabetes	hypertension	diabetes
P02	hypertension, diabetes	hypertension	diabetes
P03	diabetes	diabetes	None
P04	CHD, asthma	CHD	asthma
P05	asthma, hypertension	asthma	hypertension

After this, I am trying to take the split strings and create a new column that will contain either yes/no or 1/0. So that each sample will be able to tell me if they have this or not. Any suggestions as to how to do this? I have tried groupby on just one column, and on all the columns and it does not work. I can't share the actual data but I created a dummy dataset with an example and the output I want below.

sampleID	comorbidities	hypertension	diabetes	CHD	asthma
P01	hypertension, diabetes	yes	yes	no	no
P02	hypertension, diabetes	yes	yes	no	no
P03	diabetes	no	yes	no	no
P04	CHD, asthma	no	no	yes	yes
P05	asthma, hypertension	yes	no	no	yes

For example, what I am trying to do is take hypertension and create a new column with the name hypertension, and a simple yes/no or 1/0 for each sampleID. Any suggestions would be greatly appreciated!

答案1

得分: 0

使用str.get_dummies结合replace和join：

out = df.join(df['comorbidities'].str.get_dummies(', ').replace({0: 'no', 1: 'yes'}))

输出：

  sampleID           comorbidities  CHD asthma diabetes hypertension
0      P01  hypertension, diabetes   no     no      yes          yes
1      P02  hypertension, diabetes   no     no      yes          yes
2      P03                diabetes   no     no      yes           no
3      P04             CHD, asthma  yes    yes       no           no
4      P05    asthma, hypertension   no    yes       no          yes

英文:

Use str.get_dummies combined with replace and join:

out = df.join(df[&#39;comorbidities&#39;].str.get_dummies(&#39;, &#39;)
                                 .replace({0: &#39;no&#39;, 1: &#39;yes&#39;}))

Output:

  sampleID           comorbidities  CHD asthma diabetes hypertension
0      P01  hypertension, diabetes   no     no      yes          yes
1      P02  hypertension, diabetes   no     no      yes          yes
2      P03                diabetes   no     no      yes           no
3      P04             CHD, asthma  yes    yes       no           no
4      P05    asthma, hypertension   no    yes       no          yes

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python中拆分后提取数值以创建一个新列，标记为是或否

问题

答案1

从数字输入中过滤掉二进制数组中的短暂波动。

LSTM在Keras中的输入维度是多少？

如何在groupby的DataFrame中应用带条件的ffill fillna()。

Common mocks defined with @patch to several test case functions in Python.

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论