如何在Python pandas中将变量用作替换函数的值

huangapple go评论65阅读模式
英文:

How to use a variable as value of replace function in python pandas

问题

在这个数据框中,列 body 包含了我想要替换为对应的 baseUrl + id + .webp 的图像路径。

import pandas as pd

df = pd.DataFrame({ 'id': ["982951473","000590051"],
                    'body': ["<script type=""application/ld+json"">{""image"":""https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"",""sku"":""982951473""}</script>","<script type=""application/ld+json"">{""image"":""https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"",""sku"":""000590051""}</script>"]})

var = "f.com/" + df['id'] + ".webp"
df_r = df.replace(to_replace=r'www[.]f[.]it[\\].*?["]', value=var, regex=True)

print(df_r.to_string())

在这个示例中,最终的 URL

"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"

应该变成包含 id 变量的这个 URL

"https:\/\/f.com\/982951473.webp"
英文:

I have such dataframe where the column body has image path that I want to replace with the corresponding baseUrl + id + .webp

import pandas as pd

df = pd.DataFrame({ 'id': ["982951473","000590051"],
                    'body': ["<script type=""application/ld+json"">{""image"":""https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"",""sku"":""982951473""}</script>","<script type=""application/ld+json"">{""image"":""https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"",""sku"":""000590051""}</script>"]})

var = "f.com\/" + df['id'] + ".webp"
df_r = df.replace(to_replace=r'www[.]f[.]it[\\].*?["]', value=var, regex=True)

print(df_r.to_string())

At the end of the day this example url

""https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265""

Should become this url which contain the id variable

""https:\/\/f.com\/982951473.webp""

答案1

得分: 1

第一个错误是您提供的DataFrame中的双引号。在Python中,"...""sku"":""982951473""..."只是sku:982951473,而不是您期望的"sku":"982951473"。我修改了您提供的DataFrame...

import pandas as pd
import re

REGEX_URL = re.compile(r'www[.]f[.]it\\.*?"')

df = pd.DataFrame({
    'id': ['982951473', '000590051'],
    'body': [
        '<script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"982951473"}</script>',
        '<script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"000590051"}</script>',
    ]
})

print(df.to_string())

def replaceUrl(id, body):
    rep = f'f.com/{id}.webp"'
    return REGEX_URL.sub(rep, body)

df['body'] = df.apply(lambda x: replaceUrl(x.id, x.body), axis=1)

print(df.to_string())

或者这样,我不知道哪种方法更好/更快。

for i, row in df.iterrows():
    rep = f'f.com/{df.at[i, "id"]}.webp"'
    df.at[i, 'body'] = REGEX_URL.sub(rep, row['body'])

输出:

          id  body
0  982951473  <script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"982951473"}</script>
1  000590051  <script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"000590051"}</script>
          id  body
0  982951473  <script type="application/ld+json">{"image":"https:\/\/f.com\/982951473.webp","sku":"982951473"}</script>
1  000590051  <script type="application/ld+json">{"image":"https:\/\/f.com\/000590051.webp","sku":"000590051"}</script>

请注意,这只是代码的一部分,我已经翻译了您提供的代码部分。

英文:

The first error was the double quotations in your provided DataFrame. In python "...""sku"":""982951473""..." is simply sku:982951473 and not your expected "sku":"982951473". I modified your provided DataFrame...

import pandas as pd
import re

REGEX_URL = re.compile(r'www[.]f[.]it\\.*?"')

df = pd.DataFrame({
    'id': ['982951473', '000590051'],
    'body': [
        '<script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"982951473"}</script>',
        '<script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"000590051"}</script>',
    ]
})

print(df.to_string())

def replaceUrl(id, body):
    rep = f'f.com\/{id}.webp"'
    return REGEX_URL.sub(rep, body)

df['body'] = df.apply(lambda x: replaceUrl(x.id, x.body), axis=1)

print(df.to_string())

Or this way, I have no idea which is the better/faster approach.

for i, row, in df.iterrows():
    rep = f'f.com\/{df.at[i, "id"]}.webp"'
    df.at[i, 'body'] = REGEX_URL.sub(rep, row['body'])

output:

          id  body
0  982951473  <script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"982951473"}</script>
1  000590051  <script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"000590051"}</script>
          id  body
0  982951473  <script type="application/ld+json">{"image":"https:\/\/f.com\/982951473.webp","sku":"982951473"}</script>
1  000590051  <script type="application/ld+json">{"image":"https:\/\/f.com\/000590051.webp","sku":"000590051"}</script>

huangapple
  • 本文由 发表于 2023年1月9日 11:13:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052865.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定