2023年1月9日 11:13:19go评论100阅读模式

英文:

How to use a variable as value of replace function in python pandas

问题

在这个数据框中，列 body 包含了我想要替换为对应的 baseUrl + id + .webp 的图像路径。

import pandas as pd
df = pd.DataFrame({ 'id': ["982951473","000590051"],
                    'body': ["&lt;script type=&quot;&quot;application/ld+json&quot;&quot;&gt;{&quot;&quot;image&quot;&quot;:&quot;&quot;https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;&quot;,&quot;&quot;sku&quot;&quot;:&quot;&quot;982951473&quot;&quot;}&lt;/script&gt;","&lt;script type=&quot;&quot;application/ld+json&quot;&quot;&gt;{&quot;&quot;image&quot;&quot;:&quot;&quot;https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;&quot;,&quot;&quot;sku&quot;&quot;:&quot;&quot;000590051&quot;&quot;}&lt;/script&gt;"]})
var = "f.com/" + df['id'] + ".webp"
df_r = df.replace(to_replace=r'www[.]f[.]it[\\].*?["]', value=var, regex=True)
print(df_r.to_string())

在这个示例中，最终的 URL

"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265"

应该变成包含 id 变量的这个 URL

"https:\/\/f.com\/982951473.webp"

英文:

I have such dataframe where the column body has image path that I want to replace with the corresponding baseUrl + id + .webp

import pandas as pd
df = pd.DataFrame({ &#39;id&#39;: [&quot;982951473&quot;,&quot;000590051&quot;],
                    &#39;body&#39;: [&quot;&lt;script type=&quot;&quot;application/ld+json&quot;&quot;&gt;{&quot;&quot;image&quot;&quot;:&quot;&quot;https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;&quot;,&quot;&quot;sku&quot;&quot;:&quot;&quot;982951473&quot;&quot;}&lt;/script&gt;&quot;,&quot;&lt;script type=&quot;&quot;application/ld+json&quot;&quot;&gt;{&quot;&quot;image&quot;&quot;:&quot;&quot;https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;&quot;,&quot;&quot;sku&quot;&quot;:&quot;&quot;000590051&quot;&quot;}&lt;/script&gt;&quot;]})
var = &quot;f.com\/&quot; + df[&#39;id&#39;] + &quot;.webp&quot;
df_r = df.replace(to_replace=r&#39;www[.]f[.]it[\\].*?[&quot;]&#39;, value=var, regex=True)
print(df_r.to_string())

At the end of the day this example url

&quot;&quot;https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;&quot;

Should become this url which contain the id variable

&quot;&quot;https:\/\/f.com\/982951473.webp&quot;&quot;

答案1

得分: 1

第一个错误是您提供的DataFrame中的双引号。在Python中，"...""sku"":""982951473""..."只是sku:982951473，而不是您期望的"sku":"982951473"。我修改了您提供的DataFrame...

import pandas as pd
import re
REGEX_URL = re.compile(r'www[.]f[.]it\\.*?&quot;')
df = pd.DataFrame({
    'id': ['982951473', '000590051'],
    'body': [
        '&lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;,&quot;sku&quot;:&quot;982951473&quot;}&lt;/script&gt;',
        '&lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;,&quot;sku&quot;:&quot;000590051&quot;}&lt;/script&gt;',
    ]
})
print(df.to_string())
def replaceUrl(id, body):
    rep = f'f.com/{id}.webp"'
    return REGEX_URL.sub(rep, body)
df['body'] = df.apply(lambda x: replaceUrl(x.id, x.body), axis=1)
print(df.to_string())

或者这样，我不知道哪种方法更好/更快。

for i, row in df.iterrows():
    rep = f'f.com/{df.at[i, "id"]}.webp"'
    df.at[i, 'body'] = REGEX_URL.sub(rep, row['body'])

输出：

          id  body
0  982951473  &lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;,&quot;sku&quot;:&quot;982951473&quot;}&lt;/script&gt;
1  000590051  &lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;,&quot;sku&quot;:&quot;000590051&quot;}&lt;/script&gt;
          id  body
0  982951473  &lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/f.com\/982951473.webp&quot;,&quot;sku&quot;:&quot;982951473&quot;}&lt;/script&gt;
1  000590051  &lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/f.com\/000590051.webp&quot;,&quot;sku&quot;:&quot;000590051&quot;}&lt;/script&gt;

请注意，这只是代码的一部分，我已经翻译了您提供的代码部分。

英文:

The first error was the double quotations in your provided DataFrame. In python "...""sku"":""982951473""..." is simply sku:982951473 and not your expected "sku":"982951473". I modified your provided DataFrame...

import pandas as pd
import re
REGEX_URL = re.compile(r&#39;www[.]f[.]it\\.*?&quot;&#39;)
df = pd.DataFrame({
    &#39;id&#39;: [&#39;982951473&#39;, &#39;000590051&#39;],
    &#39;body&#39;: [
        &#39;&lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;,&quot;sku&quot;:&quot;982951473&quot;}&lt;/script&gt;&#39;,
        &#39;&lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;,&quot;sku&quot;:&quot;000590051&quot;}&lt;/script&gt;&#39;,
    ]
})
print(df.to_string())
def replaceUrl(id, body):
    rep = f&#39;f.com\/{id}.webp&quot;&#39;
    return REGEX_URL.sub(rep, body)
df[&#39;body&#39;] = df.apply(lambda x: replaceUrl(x.id, x.body), axis=1)
print(df.to_string())

Or this way, I have no idea which is the better/faster approach.

for i, row, in df.iterrows():
    rep = f&#39;f.com\/{df.at[i, &quot;id&quot;]}.webp&quot;&#39;
    df.at[i, &#39;body&#39;] = REGEX_URL.sub(rep, row[&#39;body&#39;])

output:

          id  body
0  982951473  &lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;,&quot;sku&quot;:&quot;982951473&quot;}&lt;/script&gt;
1  000590051  &lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&amp;fit=bounds&amp;height=265&amp;width=265&amp;canvas=265:265&quot;,&quot;sku&quot;:&quot;000590051&quot;}&lt;/script&gt;
          id  body
0  982951473  &lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/f.com\/982951473.webp&quot;,&quot;sku&quot;:&quot;982951473&quot;}&lt;/script&gt;
1  000590051  &lt;script type=&quot;application/ld+json&quot;&gt;{&quot;image&quot;:&quot;https:\/\/f.com\/000590051.webp&quot;,&quot;sku&quot;:&quot;000590051&quot;}&lt;/script&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python pandas中将变量用作替换函数的值

问题

答案1

Having issue installing 'Streamlit' with pip, I believe the failure is linked to Pyarrow and Cmake. I'm running MacOS High Sierra 10.13

Type hint issue when base class is parameterized on a value type and has methods that return that type

Exec fails when applied to a code with a new type

基于矩阵替换字符串表达式中的值并遍历列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。