2023年5月13日 21:50:28go评论107阅读模式

英文:

Python - Data Transformation

问题

I am trying to convert this "dataframe" into this other "dataframe" in python:

From this:

job_posting,Job_description,rating,company_name,big_data,spark,hadoop,power bi,excel,python,azure,Founded,Location
1,blalbblablabla,3,Exson,1,0,1,,,,0,2007,US
2,blalbblablabla,4,1,,,,1,1,,0,2010,EU
3,blalbblablabla,0,Wine20,1,1,,,,1,1,2000,LA

To this:

job_posting,Job_description,rating,company_name,technologies_required,Founded,Location
1,blalbblablabla,3,Exson,big_data, hadoop,2007,US
2,blalbblablabla,4,1,power bi, excel,2010,EU
3,blalbblablabla,0,Wine20,big_data, spark, python, azure,2000,LA

This is what it could look like:

Any help will be appreciated.

New revised code. @Timeless thanks!

Create "technologies_required"

df['technologies_required'] = df.apply(lambda x: ", ".join(x.index[x.eq(1) & ~x.index.isin(['job_posting', 'for_example_rating'])]), axis=1)

Reorder columns

new_df = df[['job_posting', 'for_example_rating', 'technologies_required']]

Now, it doesn't matter if any other column includes a number 1 in their rows.

Unfortunately I don't know if this is good or bad practice

英文:

I am trying to convert this "dataframe" into this other "dataframe" in python:

From this:

job_posting,Job_description,rating,company_name,big_data,spark,hadoop,power bi,excel,python,azure,Founded,Location
1,blalbblablabla,3,Exson,1,0,1,,,,0,2007,US
2,blalbblablabla,4,1,,,,1,1,,0,2010,EU
3,blalbblablabla,0,Wine20,1,1,,,,1,1,2000,LA

To this:

job_posting,Job_description,rating,company_name,technologies_required,Founded,Location
1,blalbblablabla,3,Exson,big_data, hadoop,2007,US
2,blalbblablabla,4,1,power bi, excel,2010,EU
3,blalbblablabla,0,Wine20,big_data, spark, python, azure,2000,LA

This is what it could look like:

Any help will be appreciated.

::::::::::::::::::::::::::::::::::::::::::::::::::

New revised code. @Timeless thanks!

# Create &quot;technologies_required&quot;
df[&#39;technologies_required&#39;] = df.apply(lambda x: &quot;, &quot;.join(x.index[x.eq(1) &amp; ~x.index.isin([&#39;job_posting&#39;, &#39;for_example_rating&#39;])]), axis=1)
# Reorder columns
new_df = df[[&#39;job_posting&#39;, &#39;for_example_rating&#39;, &#39;technologies_required&#39;]]

Now, it doesn't matter if any other column includes a number 1 in their rows.

Unfortunately I don't know if this is good or bad practice

答案1

得分: 1

你可以使用 apply 来布尔索引 列名：

techs = df.apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)
#techs = df.apply(lambda x: ", ".join(x.eq(1).loc[lambda s: s].index), axis=1) #变体
out = df[["job_posting"]].join(techs.rename("technologies_required"))

输出：

print(out)
   job_posting technologies_required
0         5674       big_data, spark
1        13037                 excel
2         4377        powerbi, azure

更新：

> 这段代码存在一个小问题：如果 job_posting 或其他任何列（除了 techs）的值为1，那么该列也会被包括在内。

lcols = ["job_posting", "lob_description", "rating", "company_name"]
rcols = ["Founded", "Location"]
techs = df.drop(columns=lcols+rcols).apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)
out = pd.concat([df[lcols], techs.rename("technologies_required"), df[rcols]], axis=1)

输出：

print(out)
   job_posting lob_description  rating company_name           technologies_required  Founded Location
0            1  blalbblablabla       3        Exson                big data, hadoop     2007       US
1            2  blalbblablabla       1            1                 excel, power bi     2010       EU
2            3  blalbblablabla       0       Wine20  azure, big data, python, spark     2000       LA

英文:

You can use apply to boolean-index the columns names :

techs = df.apply(lambda x: &quot;, &quot;.join(x.index[x.eq(1)]), axis=1)
#techs = df.apply(lambda x: &quot;, &quot;.join(x.eq(1).loc[lambda s: s].index), axis=1) #variant
out = df[[&quot;job_posting&quot;]].join(techs.rename(&quot;technologies_required&quot;))

Output :

print(out)
   job_posting technologies_required
0         5674       big_data, spark
1        13037                 excel
2         4377        powerbi, azure

Update :

> There is a little problem in this code: If job_posting or any other
> column (not techs) has a value of 1, then that column is included as
> well.

lcols = [&quot;job_posting&quot;, &quot;lob_description&quot;, &quot;rating&quot;, &quot;company_name&quot;]
rcols = [&quot;Founded&quot;, &quot;Location&quot;]
techs = df.drop(columns=lcols+rcols).apply(lambda x: &quot;, &quot;.join(x.index[x.eq(1)]), axis=1)
out = pd.concat([df[lcols], techs.rename(&quot;technologies_required&quot;), df[rcols]], axis=1)

Output :

print(out)
   job_posting lob_description  rating company_name           technologies_required  Founded Location
0            1  blalbblablabla       3        Exson                big data, hadoop     2007       US
1            2  blalbblablabla       1            1                 excel, power bi     2010       EU
2            3  blalbblablabla       0       Wine20  azure, big data, python, spark     2000       LA

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python – 数据转换

问题

Create "technologies_required"

Reorder columns

答案1

How to extract text from very large XML files in Python without interrupting tags while parsing incrementally?

属性赋值预期。当与Jinja结合使用时，javascript

计算风速连续高于先前数值时的时间段。

如何获得半幻方？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。