英文:
Python - Data Transformation
问题
I am trying to convert this "dataframe" into this other "dataframe" in python:
From this:
job_posting,Job_description,rating,company_name,big_data,spark,hadoop,power bi,excel,python,azure,Founded,Location
1,blalbblablabla,3,Exson,1,0,1,,,,0,2007,US
2,blalbblablabla,4,1,,,,1,1,,0,2010,EU
3,blalbblablabla,0,Wine20,1,1,,,,1,1,2000,LA
To this:
job_posting,Job_description,rating,company_name,technologies_required,Founded,Location
1,blalbblablabla,3,Exson,big_data, hadoop,2007,US
2,blalbblablabla,4,1,power bi, excel,2010,EU
3,blalbblablabla,0,Wine20,big_data, spark, python, azure,2000,LA
This is what it could look like:
Any help will be appreciated.
New revised code. @Timeless thanks!
Create "technologies_required"
df['technologies_required'] = df.apply(lambda x: ", ".join(x.index[x.eq(1) & ~x.index.isin(['job_posting', 'for_example_rating'])]), axis=1)
Reorder columns
new_df = df[['job_posting', 'for_example_rating', 'technologies_required']]
Now, it doesn't matter if any other column includes a number 1 in their rows.
Unfortunately I don't know if this is good or bad practice
英文:
I am trying to convert this "dataframe" into this other "dataframe" in python:
From this:
job_posting,Job_description,rating,company_name,big_data,spark,hadoop,power bi,excel,python,azure,Founded,Location
1,blalbblablabla,3,Exson,1,0,1,,,,0,2007,US
2,blalbblablabla,4,1,,,,1,1,,0,2010,EU
3,blalbblablabla,0,Wine20,1,1,,,,1,1,2000,LA
To this:
job_posting,Job_description,rating,company_name,technologies_required,Founded,Location
1,blalbblablabla,3,Exson,big_data, hadoop,2007,US
2,blalbblablabla,4,1,power bi, excel,2010,EU
3,blalbblablabla,0,Wine20,big_data, spark, python, azure,2000,LA
This is what it could look like:
Any help will be appreciated.
::::::::::::::::::::::::::::::::::::::::::::::::::
New revised code. @Timeless thanks!
# Create "technologies_required"
df['technologies_required'] = df.apply(lambda x: ", ".join(x.index[x.eq(1) & ~x.index.isin(['job_posting', 'for_example_rating'])]), axis=1)
# Reorder columns
new_df = df[['job_posting', 'for_example_rating', 'technologies_required']]
Now, it doesn't matter if any other column includes a number 1 in their rows.
Unfortunately I don't know if this is good or bad practice
答案1
得分: 1
techs = df.apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)
#techs = df.apply(lambda x: ", ".join(x.eq(1).loc[lambda s: s].index), axis=1) #变体
out = df[["job_posting"]].join(techs.rename("technologies_required"))
输出:
print(out)
job_posting technologies_required
0 5674 big_data, spark
1 13037 excel
2 4377 powerbi, azure
更新:
> 这段代码存在一个小问题:如果 job_posting
或其他任何列(除了 techs
)的值为1,那么该列也会被包括在内。
lcols = ["job_posting", "lob_description", "rating", "company_name"]
rcols = ["Founded", "Location"]
techs = df.drop(columns=lcols+rcols).apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)
out = pd.concat([df[lcols], techs.rename("technologies_required"), df[rcols]], axis=1)
输出:
print(out)
job_posting lob_description rating company_name technologies_required Founded Location
0 1 blalbblablabla 3 Exson big data, hadoop 2007 US
1 2 blalbblablabla 1 1 excel, power bi 2010 EU
2 3 blalbblablabla 0 Wine20 azure, big data, python, spark 2000 LA
英文:
You can use apply
to boolean-index the columns names :
techs = df.apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)
#techs = df.apply(lambda x: ", ".join(x.eq(1).loc[lambda s: s].index), axis=1) #variant
out = df[["job_posting"]].join(techs.rename("technologies_required"))
Output :
print(out)
job_posting technologies_required
0 5674 big_data, spark
1 13037 excel
2 4377 powerbi, azure
Update :
> There is a little problem in this code: If job_posting or any other
> column (not techs) has a value of 1, then that column is included as
> well.
lcols = ["job_posting", "lob_description", "rating", "company_name"]
rcols = ["Founded", "Location"]
techs = df.drop(columns=lcols+rcols).apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)
out = pd.concat([df[lcols], techs.rename("technologies_required"), df[rcols]], axis=1)
Output :
print(out)
job_posting lob_description rating company_name technologies_required Founded Location
0 1 blalbblablabla 3 Exson big data, hadoop 2007 US
1 2 blalbblablabla 1 1 excel, power bi 2010 EU
2 3 blalbblablabla 0 Wine20 azure, big data, python, spark 2000 LA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论