将openpyxl转换为pandas

huangapple go评论62阅读模式
英文:

Convert openpyxl to pandas

问题

在做了一些研究后我了解到pandas相对于openpyxl可以提供更好的性能
我正在尝试将这段代码从openpyxl转换为pandas
```python
    def get_restaurant_data(self, res: Restaurant, sf_id: int) -> RestaurantSchema:
        df = pd.read_excel(self.path, index_col=None)
        filtered_df = df[df[df.columns[0]] == sf_id]
        
        if not filtered_df.empty:
            row = filtered_df.iloc[0]
            res.address = self.check_data(row.iloc[1], res.address)
            if self.check_data(row.iloc[2]) and self.check_data(row.iloc[3]):
                res.loc = Point(row.iloc[3], row.iloc[2])
            res.phone = self.check_data(row.iloc[4], res.manager_phone)

            logo = self.download_file(row.iloc[5])
            cover = self.download_file(row.iloc[6])
            if logo:
                res.logo.save(self.check_data(row.iloc[5]), logo, save=False)
            if cover:
                res.cover.save(self.check_data(row.iloc[6]), cover, save=False)
            
            return RestaurantSchema(**res.__dict__)

<details>
<summary>英文:</summary>

After doing some research I understood that pandas can give us better performace comparing to openpyxl.
I&#39;m trying to convert this code from openpyxl to pandas:
```python
    def get_restaurant_data(self, res: Restaurant, sf_id: int) -&gt; RestaurantSchema:
        sheet = self.workbook[&quot;VendorInfo&quot;]
        for row in sheet.iter_rows(min_row=2):
            if row[0].value == sf_id:
                res.address = self.check_data(row[1].value, res.address)
                if self.check_data(row[2].value) and self.check_data(row[3].value):
                    res.loc = Point(row[3].value, row[2].value)
                res.phone = self.check_data(row[4].value, res.manager_phone)

                logo = self.download_file(row[5].value)
                cover = self.download_file(row[6].value)
                if logo:
                    res.logo.save(self.check_data(row[5].value), logo, save=False)
                if cover:
                    res.cover.save(self.check_data(row[6].value), cover, save=False)
                return RestaurantSchema(**res.__dict__)

Anyone can give me a suggestion about how to convert this code?

        df = pd.read_excel(self.path,index_col=None)
        print(f&#39;&gt;&gt;&gt;{df}&#39;)```

</details>


# 答案1
**得分**: 1

以下是使用pandas的等效代码:

```python
import pandas as pd

def get_restaurant_data(self, res: Restaurant, sf_id: int) -> RestaurantSchema:
    sheet = self.workbook["VendorInfo"]
    df = pd.DataFrame(sheet.values)
    for i, row in df[1:].iterrows():
        if row[0] == sf_id:
            res.address = self.check_data(row[1], res.address)
            if self.check_data(row[2]) and self.check_data(row[3]):
                res.loc = Point(row[3], row[2])
            res.phone = self.check_data(row[4], res.manager_phone)

            logo = self.download_file(row[5])
            cover = self.download_file(row[6])
            if logo:
                res.logo.save(self.check_data(row[5]), logo, save=False)
            if cover:
                res.cover.save(self.check_data(row[6]), cover, save=False)
            return RestaurantSchema(**res.__dict__)

这段代码使用pandas从“VendorInfo”表中读取数据到一个数据帧中,然后使用iterrows()方法遍历数据帧的行。其余的代码基本与原始代码相同,唯一的区别是使用[]运算符访问每行中的值,而不是值属性。

希望这有所帮助!

英文:

Here is the equivalent code using pandas:

import pandas as pd

def get_restaurant_data(self, res: Restaurant, sf_id: int) -&gt; 
RestaurantSchema:
sheet = self.workbook[&quot;VendorInfo&quot;]
df = pd.DataFrame(sheet.values)
for i, row in df[1:].iterrows():
    if row[0] == sf_id:
        res.address = self.check_data(row[1], res.address)
        if self.check_data(row[2]) and self.check_data(row[3]):
            res.loc = Point(row[3], row[2])
        res.phone = self.check_data(row[4], res.manager_phone)

        logo = self.download_file(row[5])
        cover = self.download_file(row[6])
        if logo:
            res.logo.save(self.check_data(row[5]), logo, save=False)
        if cover:
            res.cover.save(self.check_data(row[6]), cover, save=False)
        return RestaurantSchema(**res.__dict__)

This code uses pandas to read the data from the "VendorInfo" sheet into a dataframe, and then uses the iterrows() method to iterate over the rows of the dataframe. The rest of the code is mostly the same as the original, with the exception of using the [] operator to access the values in each row instead of the value attribute.

I hope this helps!

huangapple
  • 本文由 发表于 2023年1月9日 04:38:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75051105.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定