英文:
How to use a variable as value of replace function in python pandas
问题
在这个数据框中,列 body
包含了我想要替换为对应的 baseUrl + id
+ .webp 的图像路径。
import pandas as pd
df = pd.DataFrame({ 'id': ["982951473","000590051"],
'body': ["<script type=""application/ld+json"">{""image"":""https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"",""sku"":""982951473""}</script>","<script type=""application/ld+json"">{""image"":""https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"",""sku"":""000590051""}</script>"]})
var = "f.com/" + df['id'] + ".webp"
df_r = df.replace(to_replace=r'www[.]f[.]it[\\].*?["]', value=var, regex=True)
print(df_r.to_string())
在这个示例中,最终的 URL
"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"
应该变成包含 id 变量的这个 URL
"https:\/\/f.com\/982951473.webp"
英文:
I have such dataframe where the column body
has image path that I want to replace with the corresponding baseUrl + id
+ .webp
import pandas as pd
df = pd.DataFrame({ 'id': ["982951473","000590051"],
'body': ["<script type=""application/ld+json"">{""image"":""https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"",""sku"":""982951473""}</script>","<script type=""application/ld+json"">{""image"":""https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265"",""sku"":""000590051""}</script>"]})
var = "f.com\/" + df['id'] + ".webp"
df_r = df.replace(to_replace=r'www[.]f[.]it[\\].*?["]', value=var, regex=True)
print(df_r.to_string())
At the end of the day this example url
""https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265""
Should become this url which contain the id variable
""https:\/\/f.com\/982951473.webp""
答案1
得分: 1
第一个错误是您提供的DataFrame中的双引号。在Python中,"...""sku"":""982951473""..."
只是sku:982951473
,而不是您期望的"sku":"982951473"
。我修改了您提供的DataFrame...
import pandas as pd
import re
REGEX_URL = re.compile(r'www[.]f[.]it\\.*?"')
df = pd.DataFrame({
'id': ['982951473', '000590051'],
'body': [
'<script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"982951473"}</script>',
'<script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"000590051"}</script>',
]
})
print(df.to_string())
def replaceUrl(id, body):
rep = f'f.com/{id}.webp"'
return REGEX_URL.sub(rep, body)
df['body'] = df.apply(lambda x: replaceUrl(x.id, x.body), axis=1)
print(df.to_string())
或者这样,我不知道哪种方法更好/更快。
for i, row in df.iterrows():
rep = f'f.com/{df.at[i, "id"]}.webp"'
df.at[i, 'body'] = REGEX_URL.sub(rep, row['body'])
输出:
id body
0 982951473 <script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"982951473"}</script>
1 000590051 <script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"000590051"}</script>
id body
0 982951473 <script type="application/ld+json">{"image":"https:\/\/f.com\/982951473.webp","sku":"982951473"}</script>
1 000590051 <script type="application/ld+json">{"image":"https:\/\/f.com\/000590051.webp","sku":"000590051"}</script>
请注意,这只是代码的一部分,我已经翻译了您提供的代码部分。
英文:
The first error was the double quotations in your provided DataFrame. In python "...""sku"":""982951473""..."
is simply sku:982951473
and not your expected "sku":"982951473"
. I modified your provided DataFrame...
import pandas as pd
import re
REGEX_URL = re.compile(r'www[.]f[.]it\\.*?"')
df = pd.DataFrame({
'id': ['982951473', '000590051'],
'body': [
'<script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"982951473"}</script>',
'<script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"000590051"}</script>',
]
})
print(df.to_string())
def replaceUrl(id, body):
rep = f'f.com\/{id}.webp"'
return REGEX_URL.sub(rep, body)
df['body'] = df.apply(lambda x: replaceUrl(x.id, x.body), axis=1)
print(df.to_string())
Or this way, I have no idea which is the better/faster approach.
for i, row, in df.iterrows():
rep = f'f.com\/{df.at[i, "id"]}.webp"'
df.at[i, 'body'] = REGEX_URL.sub(rep, row['body'])
output:
id body
0 982951473 <script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/2\/4\/240297_1.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"982951473"}</script>
1 000590051 <script type="application/ld+json">{"image":"https:\/\/www.f.it\/media\/catalog\/product\/0\/0\/000590051_rinazina_spray.jpg?optimize=medium&fit=bounds&height=265&width=265&canvas=265:265","sku":"000590051"}</script>
id body
0 982951473 <script type="application/ld+json">{"image":"https:\/\/f.com\/982951473.webp","sku":"982951473"}</script>
1 000590051 <script type="application/ld+json">{"image":"https:\/\/f.com\/000590051.webp","sku":"000590051"}</script>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论