在Pandas数据框中操作数值

huangapple go评论60阅读模式
英文:

Manipulating Values in Pandas DataFrames

问题

# 创建并应用一个名为`change`的函数,修改`grocery`数据框中的单个列
# 以图像中所示的方式
# 使用`map()`或`apply()`函数来解决问题
# 主要问题是使用`split()`方法,因为`category`列中的值长度不一致
# 还可以使用其他字符串操作方法吗?

# 导入pandas库
import pandas as pd

# 创建groceries字典
groceries = {
    'grocery': ['Tesco\'s wafers', 'Asda\'s shortbread', 'Aldi\'s lemon tea', 'Sainsbury\'s croissant', 'Morrison\'s doughnut', 'Amazon fresh\'s peppermint tea', 'Bar becan\'s pizza', 'Pound savers\' shower gel'],
    'category': ['biscuit', 'biscuit', 'tea', 'bakery', 'bakery', 'tea', 'bakery', 'hygiene'],
    'price': [0.99, 1.24, 1.89, 0.75, 0.50, 2.5, 4.99, 2]
}

# 创建数据帧df
df = pd.DataFrame(groceries)

# 定义函数,修改单个列的值 - grocery
def change(x):
    return df['grocery'].str.split(' ').str[1]

# 应用函数到grocery列
df['grocery'] = df['grocery'].map(change)

# 预期的数据帧
expected_df = pd.DataFrame({
    'grocery': ['Wafers', 'Shortbread', 'Lemon Tea', 'Croissant', 'Doughnut', 'Peppermint Tea', 'Pizza', 'Shower Gel'],
    'category': ['biscuit', 'biscuit', 'tea', 'bakery', 'bakery', 'tea', 'bakery', 'hygiene'],
    'price': [0.99, 1.24, 1.89, 0.75, 0.50, 2.5, 4.99, 2]
})

请注意,以上代码是用于创建和应用函数以修改grocery列的代码示例,以生成预期的数据帧。

英文:

I am trying to create and apply a function def change(x): which modifies a single column of values grocery in the grocery data frame as shown in the image below
grocery data

I want to achieve the result in the image below
output

I am at the beginner level in python but I know I can use the map() or apply() functions to solve this. My main problem is using the split() method to achieve the result as the values in the category column are of varying lengths. Or are there other string manipulation methods that can be used?

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

import pandas as pd

groceries = {
&#39;grocery&#39;:[&#39;Tesco&#39;s wafers&#39;, &#39;Asda&#39;s shortbread&#39;, &#39;Aldi&#39;s lemon tea&#39;, &#39;Sainsbury&#39;s croissant&#39;, &#39;Morrison&#39;s doughnut&#39;, &#39;Amazon fresh&#39;s peppermint tea&#39;, &#39;Bar becan&#39;s pizza&#39;, &#39;Pound savers&#39; shower gel&#39;],
&#39;category&#39;:[&#39;biscuit&#39;, &#39;biscuit&#39;, &#39;tea&#39;, &#39;bakery&#39;, &#39;bakery&#39;, &#39;tea&#39;, &#39;bakery&#39;, &#39;hygiene&#39;],
&#39;price&#39;:[0.99, 1.24, 1.89, 0.75, 0.50, 2.5, 4.99, 2]
}

df = pd.DataFrame(groceries)
df

# function to modify a single column of values - grocery
def change(x):
    return df[&#39;grocery].str.split(&#39; &#39;).str[1]

df = pd.DataFrame(groceries)

df[&#39;grocery&#39;] = df[&#39;grocery&#39;].map(change)
df


# Expected DataFrame
groceries = pd.DataFrame({
&#39;grocery&#39;:[&#39;Wafers&#39;, &#39;Shortbread&#39;, &#39;Lemon Tea&#39;, &#39;Croissant&#39;, &#39;Doughnut&#39;, &#39;Peppermint Tea&#39;, &#39;Pizza&#39;, &#39;Shower Gel&#39;],
&#39;category&#39;:[&#39;biscuit&#39;, &#39;biscuit&#39;, &#39;tea&#39;, &#39;bakery&#39;, &#39;bakery&#39;, &#39;tea&#39;, &#39;bakery&#39;, &#39;hygiene&#39;],
&#39;price&#39;:[0.99, 1.24, 1.89, 0.75, 0.50, 2.5, 4.99, 2]
})

<!-- end snippet -->

答案1

得分: 0

df['grocery'] = df['grocery'].apply(lambda x: (x.split('s', 1) if 's' in x else x.split('\'', 1))[1].title())
英文:

Assuming you have a dataframe df with your original data:

df[&#39;grocery&#39;] = df[&#39;grocery&#39;].apply(lambda x: (x.split(&quot;&#39;s&quot;, 1) if &quot;&#39;s&quot; in x else x.split(&quot;&#39;&quot;, 1))[1].title())

答案2

得分: 0

我希望这对您的解决方案有效,我使用“'”逗号拆分它,然后从字符串的索引1开始。这取决于条件。

import pandas as pd

groceries = {
    'grocery': [
        "Tesco's wafers", "Asda's shortbread", "Aldi's lemon tea",
        "Sainsbury's croissant", "Morrison's doughnut",
        "Amazon fresh's peppermint tea", "Bar becan's pizza",
        "Pound savers' shower gel"
    ],
    'category': [
        'biscuit', 'biscuit', 'tea', 'bakery', 'bakery', 'tea', 'bakery',
        'hygiene'
    ],
    'price': [0.99, 1.24, 1.89, 0.75, 0.50, 2.5, 4.99, 2]
}

df = pd.DataFrame(groceries)
# 使用“'”逗号拆分并从字符串的索引1开始
df['grocery'] = df['grocery'].apply(lambda x: x.split("'")[1][1:].title())
df

这是您提供的代码的翻译部分。

英文:

I hope this works for your solution, I split it with "'" comma and then start it with from 1 index of a string. It depends on conditions

import pandas as pd

groceries = {
    &#39;grocery&#39;: [
        &quot;Tesco&#39;s wafers&quot;, &quot;Asda&#39;s shortbread&quot;, &quot;Aldi&#39;s lemon tea&quot;,
        &quot;Sainsbury&#39;s croissant&quot;, &quot;Morrison&#39;s doughnut&quot;,
        &quot;Amazon fresh&#39;s peppermint tea&quot;, &quot;Bar becan&#39;s pizza&quot;,
        &quot;Pound savers&#39; shower gel&quot;
    ],
    &#39;category&#39;: [
        &#39;biscuit&#39;, &#39;biscuit&#39;, &#39;tea&#39;, &#39;bakery&#39;, &#39;bakery&#39;, &#39;tea&#39;, &#39;bakery&#39;,
        &#39;hygiene&#39;
    ],
    &#39;price&#39;: [0.99, 1.24, 1.89, 0.75, 0.50, 2.5, 4.99, 2]
}

df = pd.DataFrame(groceries)
# split it with &quot;&#39;&quot; comma and then start it with from 1 index of a string 
# if multiple conditions for grocery string then 

# def grocery_chng(x):
#     # specify multiple conditions to replace a string
#     return x
# df[&#39;grocery&#39;] = df[&#39;grocery&#39;].apply(grocery_chng)

df[&#39;grocery&#39;] = df[&#39;grocery&#39;].apply(lambda x: x.split(&quot;&#39;&quot;)[1][1:].title())
df

huangapple
  • 本文由 发表于 2023年2月6日 09:53:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定