Python – 数据转换

huangapple go评论69阅读模式
英文:

Python - Data Transformation

问题

I am trying to convert this "dataframe" into this other "dataframe" in python:

From this:

job_posting,Job_description,rating,company_name,big_data,spark,hadoop,power bi,excel,python,azure,Founded,Location
1,blalbblablabla,3,Exson,1,0,1,,,,0,2007,US
2,blalbblablabla,4,1,,,,1,1,,0,2010,EU
3,blalbblablabla,0,Wine20,1,1,,,,1,1,2000,LA

To this:

job_posting,Job_description,rating,company_name,technologies_required,Founded,Location
1,blalbblablabla,3,Exson,big_data, hadoop,2007,US
2,blalbblablabla,4,1,power bi, excel,2010,EU
3,blalbblablabla,0,Wine20,big_data, spark, python, azure,2000,LA

This is what it could look like: Python – 数据转换

Any help will be appreciated.

New revised code. @Timeless thanks!

Create "technologies_required"

df['technologies_required'] = df.apply(lambda x: ", ".join(x.index[x.eq(1) & ~x.index.isin(['job_posting', 'for_example_rating'])]), axis=1)

Reorder columns

new_df = df[['job_posting', 'for_example_rating', 'technologies_required']]

Now, it doesn't matter if any other column includes a number 1 in their rows.

Unfortunately I don't know if this is good or bad practice Python – 数据转换

英文:

I am trying to convert this "dataframe" into this other "dataframe" in python:

From this:

job_posting,Job_description,rating,company_name,big_data,spark,hadoop,power bi,excel,python,azure,Founded,Location
1,blalbblablabla,3,Exson,1,0,1,,,,0,2007,US
2,blalbblablabla,4,1,,,,1,1,,0,2010,EU
3,blalbblablabla,0,Wine20,1,1,,,,1,1,2000,LA

To this:

job_posting,Job_description,rating,company_name,technologies_required,Founded,Location
1,blalbblablabla,3,Exson,big_data, hadoop,2007,US
2,blalbblablabla,4,1,power bi, excel,2010,EU
3,blalbblablabla,0,Wine20,big_data, spark, python, azure,2000,LA

This is what it could look like:
Python – 数据转换

Any help will be appreciated.

::::::::::::::::::::::::::::::::::::::::::::::::::

New revised code. @Timeless thanks!

# Create "technologies_required"
df['technologies_required'] = df.apply(lambda x: ", ".join(x.index[x.eq(1) & ~x.index.isin(['job_posting', 'for_example_rating'])]), axis=1)

# Reorder columns
new_df = df[['job_posting', 'for_example_rating', 'technologies_required']]

Now, it doesn't matter if any other column includes a number 1 in their rows.

Unfortunately I don't know if this is good or bad practice Python – 数据转换

答案1

得分: 1

你可以使用 apply布尔索引 列名:

techs = df.apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)
#techs = df.apply(lambda x: ", ".join(x.eq(1).loc[lambda s: s].index), axis=1) #变体

out = df[["job_posting"]].join(techs.rename("technologies_required"))

输出:

print(out)

   job_posting technologies_required
0         5674       big_data, spark
1        13037                 excel
2         4377        powerbi, azure

更新:

> 这段代码存在一个小问题:如果 job_posting 或其他任何列(除了 techs)的值为1,那么该列也会被包括在内。

lcols = ["job_posting", "lob_description", "rating", "company_name"]
rcols = ["Founded", "Location"]

techs = df.drop(columns=lcols+rcols).apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)

out = pd.concat([df[lcols], techs.rename("technologies_required"), df[rcols]], axis=1)

输出:

print(out)

   job_posting lob_description  rating company_name           technologies_required  Founded Location
0            1  blalbblablabla       3        Exson                big data, hadoop     2007       US
1            2  blalbblablabla       1            1                 excel, power bi     2010       EU
2            3  blalbblablabla       0       Wine20  azure, big data, python, spark     2000       LA
英文:

You can use apply to boolean-index the columns names :

techs = df.apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)
#techs = df.apply(lambda x: ", ".join(x.eq(1).loc[lambda s: s].index), axis=1) #variant

out = df[["job_posting"]].join(techs.rename("technologies_required"))

Output :

print(out)

   job_posting technologies_required
0         5674       big_data, spark
1        13037                 excel
2         4377        powerbi, azure

Update :

> There is a little problem in this code: If job_posting or any other
> column (not techs) has a value of 1, then that column is included as
> well.

lcols = ["job_posting", "lob_description", "rating", "company_name"]
rcols = ["Founded", "Location"]

techs = df.drop(columns=lcols+rcols).apply(lambda x: ", ".join(x.index[x.eq(1)]), axis=1)

out = pd.concat([df[lcols], techs.rename("technologies_required"), df[rcols]], axis=1)

Output :

print(out)

   job_posting lob_description  rating company_name           technologies_required  Founded Location
0            1  blalbblablabla       3        Exson                big data, hadoop     2007       US
1            2  blalbblablabla       1            1                 excel, power bi     2010       EU
2            3  blalbblablabla       0       Wine20  azure, big data, python, spark     2000       LA

huangapple
  • 本文由 发表于 2023年5月13日 21:50:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76243072.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定