英文:
Convert openpyxl to pandas
问题
在做了一些研究后,我了解到pandas相对于openpyxl可以提供更好的性能。
我正在尝试将这段代码从openpyxl转换为pandas:
```python
def get_restaurant_data(self, res: Restaurant, sf_id: int) -> RestaurantSchema:
df = pd.read_excel(self.path, index_col=None)
filtered_df = df[df[df.columns[0]] == sf_id]
if not filtered_df.empty:
row = filtered_df.iloc[0]
res.address = self.check_data(row.iloc[1], res.address)
if self.check_data(row.iloc[2]) and self.check_data(row.iloc[3]):
res.loc = Point(row.iloc[3], row.iloc[2])
res.phone = self.check_data(row.iloc[4], res.manager_phone)
logo = self.download_file(row.iloc[5])
cover = self.download_file(row.iloc[6])
if logo:
res.logo.save(self.check_data(row.iloc[5]), logo, save=False)
if cover:
res.cover.save(self.check_data(row.iloc[6]), cover, save=False)
return RestaurantSchema(**res.__dict__)
<details>
<summary>英文:</summary>
After doing some research I understood that pandas can give us better performace comparing to openpyxl.
I'm trying to convert this code from openpyxl to pandas:
```python
def get_restaurant_data(self, res: Restaurant, sf_id: int) -> RestaurantSchema:
sheet = self.workbook["VendorInfo"]
for row in sheet.iter_rows(min_row=2):
if row[0].value == sf_id:
res.address = self.check_data(row[1].value, res.address)
if self.check_data(row[2].value) and self.check_data(row[3].value):
res.loc = Point(row[3].value, row[2].value)
res.phone = self.check_data(row[4].value, res.manager_phone)
logo = self.download_file(row[5].value)
cover = self.download_file(row[6].value)
if logo:
res.logo.save(self.check_data(row[5].value), logo, save=False)
if cover:
res.cover.save(self.check_data(row[6].value), cover, save=False)
return RestaurantSchema(**res.__dict__)
Anyone can give me a suggestion about how to convert this code?
df = pd.read_excel(self.path,index_col=None)
print(f'>>>{df}')```
</details>
# 答案1
**得分**: 1
以下是使用pandas的等效代码:
```python
import pandas as pd
def get_restaurant_data(self, res: Restaurant, sf_id: int) -> RestaurantSchema:
sheet = self.workbook["VendorInfo"]
df = pd.DataFrame(sheet.values)
for i, row in df[1:].iterrows():
if row[0] == sf_id:
res.address = self.check_data(row[1], res.address)
if self.check_data(row[2]) and self.check_data(row[3]):
res.loc = Point(row[3], row[2])
res.phone = self.check_data(row[4], res.manager_phone)
logo = self.download_file(row[5])
cover = self.download_file(row[6])
if logo:
res.logo.save(self.check_data(row[5]), logo, save=False)
if cover:
res.cover.save(self.check_data(row[6]), cover, save=False)
return RestaurantSchema(**res.__dict__)
这段代码使用pandas从“VendorInfo”表中读取数据到一个数据帧中,然后使用iterrows()方法遍历数据帧的行。其余的代码基本与原始代码相同,唯一的区别是使用[]运算符访问每行中的值,而不是值属性。
希望这有所帮助!
英文:
Here is the equivalent code using pandas:
import pandas as pd
def get_restaurant_data(self, res: Restaurant, sf_id: int) ->
RestaurantSchema:
sheet = self.workbook["VendorInfo"]
df = pd.DataFrame(sheet.values)
for i, row in df[1:].iterrows():
if row[0] == sf_id:
res.address = self.check_data(row[1], res.address)
if self.check_data(row[2]) and self.check_data(row[3]):
res.loc = Point(row[3], row[2])
res.phone = self.check_data(row[4], res.manager_phone)
logo = self.download_file(row[5])
cover = self.download_file(row[6])
if logo:
res.logo.save(self.check_data(row[5]), logo, save=False)
if cover:
res.cover.save(self.check_data(row[6]), cover, save=False)
return RestaurantSchema(**res.__dict__)
This code uses pandas to read the data from the "VendorInfo" sheet into a dataframe, and then uses the iterrows() method to iterate over the rows of the dataframe. The rest of the code is mostly the same as the original, with the exception of using the [] operator to access the values in each row instead of the value attribute.
I hope this helps!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论