wFastCGI / Flask – 在IIS上重新启动Web服务器

huangapple go评论73阅读模式
英文:

wFastCGI / Flask - Restarting webserver on IIS

问题

I'm building a Web-App that is fetching data from an API and displaying it. For that, I'm using Flask and the request library. Because the API is not well laid out, I need to make a bunch of API calls to get all the data I need.

Here is how the simplified folder structure looks like:

app.py
api/
api.py

To not overload the API and send hundreds of API requests on every GET request, I tried to implement a function that fetches the data on web server start, stores it into a variable, and refreshes the data after a specific time. Here is a simplified API class and refresh function:

"""
The API class gets initialized on web server start
"""
class API:
    def __init(self):
        self.API_KEY = 'xxx-xxx'
        self.BASE_URL = 'https://xxxxxxxx.com/3'
        self.HEADER = {
            'X-Api-Key': f'{self.API_KEY}',
            'Accept': 'application/json'
        }

    self.session = requests.session()
    self.session.headers.update(self.HEADER)

    self.data = {}
    self.refresh_time = 900  # how long the function should wait until next refresh

    threading.Thread(target=refresh_data).start()


def refresh_data(self):
    while True:
        self._refresh()  # function that fetches the data from the API and stores/refreshes the data in the self.data JSON
        time.sleep(self.refresh_time)

I know it's probably not the best way to handle this, but in my venv, it works without problems.

If I make this web app production-ready and deploy it to Windows IIS with wFastCGI, the web server gets restarted randomly (I didn't notice any pattern), and so the API class gets initialized multiple times, meaning the refresh function gets started multiple times.

Here is some logging of the web server:

2023-06-05 07:54:29,298 [MainThread ] [()] [INFO ] Setting up APIs... # Log from web server
2023-06-05 07:54:29,299 [MainThread ] [init()] [DEBUG] API Class init > debug log in API class
2023-06-05 07:54:29,377 [MainThread ] [index()] [INFO ] GET from 192.168.18.125 # GET request
2023-06-05 07:54:30,001 [MainThread ] [()] [INFO ] Setting up APIs... # Log from web server
2023-06-05 07:54:30,001 [MainThread ] [()] [INFO ] Setting up APIs... # Log from web server
2023-06-05 07:54:30,001 [MainThread ] [init()] [DEBUG] API Class init >
2023-06-05 07:54:30,001 [MainThread ] [init()] [DEBUG] API Class init > debug log from the same API class
2023-06-05 07:54:30,002 [Thread-1 (_s] [refresh_data()] [INFO ] Checking data...
2023-06-05 07:54:30,002 [Thread-1 (_s] [refresh_data()] [INFO ] Checking data...
2023-06-05 07:54:30,006 [Thread-1 (_s] [_refresh()] [INFO ] Refreshing data...
2023-06-05 07:54:30,007 [Thread-1 (_s] [get_something()] [INFO ] Getting data...

I already did some research; maybe this helps.

  1. wfastcgi GitHub question so I thought because I'm writing the logs to a file in the web server folder, the server gets restarted, so I wrote logs outside the folder, but the server kept restarting (I also tried to edit the web.config, but nothing worked for me)
  2. Microsoft Dev Network question a similar question I found

Can anyone explain this behavior to me? I would appreciate it if there are any suggestions on how to handle a timed API call or, in other words, a queue.

EDIT:

I found out that the IIS has a load balancing feature, which can load a website (or web app) on demand or let the website always running.

Here is what I found IIS - "Always On" Application Pool

But the feature has no impact on the wFastCGI, the application is still restarting.

英文:

I'm building a Web-App that is fetching data from an API and displaying it. For that im using Flask and the request library. Because the API is not well layed out, i need to make a bunch of API calls to get all the data i need.

Here is how the simplified folder structure looks like:

app.py
api/
  api.py

To not overload the api and sending hundreds of api requests on every GET requests, i tried to implement a function that fetches the data on webserver start, stores it into a variable and refreshes the data after a specific time. Here is a simplified api class and refresh function

"""
The API class gets initizialized on webserver start
"""
class API:
    def __init(self):
        self.API_KEY = 'xxx-xxx'
        self.BASE_URL = 'https://xxxxxxxx.com/3'
        self.HEADER = {
            'X-Api-Key': f'{self.API_KEY}',
            'Accept': 'application/json'
        }

    self.session = requests.session()
    self.session.headers.update(self.HEADER)

    self.data = {}
    self.refresh_time = 900 # how long the function should wait until next refresh

    threading.Thread(target=refresh_data).start()


def refresh_data(self):
    while True:
        self._refresh() # function that fetches the data from the API and stores/refreshes the in the self.data json
        time.sleep(self.refresh_time)

I know its probably not the best way how to handle this, but in my venv it works without problems.

If i make this webapp production ready > deploying it to Windows IIS with wFastCGI the webserver gets restartet randomly ( i didnt noticed any pattern ) and so the api class gets initizialized multiple times meaning the refresh function gets started multiple times.

Here is some logging of the webserver:

2023-06-05 07:54:29,298 [MainThread  ] [            <module>()] [INFO ]  Setting up APIs... 		# Log from webserver
2023-06-05 07:54:29,299 [MainThread  ] [            __init__()] [DEBUG]  API Class init 			> debug log in API class
2023-06-05 07:54:29,377 [MainThread  ] [               index()] [INFO ]  GET from 192.168.18.125 	# GET request 
2023-06-05 07:54:30,001 [MainThread  ] [            <module>()] [INFO ]  Setting up APIs... 		# Log from webserver
2023-06-05 07:54:30,001 [MainThread  ] [            <module>()] [INFO ]  Setting up APIs...			# Log from webserver
2023-06-05 07:54:30,001 [MainThread  ] [            __init__()] [DEBUG]  API Class init				> 
2023-06-05 07:54:30,001 [MainThread  ] [            __init__()] [DEBUG]  API Class init				> debug log from the same API class
2023-06-05 07:54:30,002 [Thread-1 (_s] [        refresh_data()] [INFO ]  Checking data...			
2023-06-05 07:54:30,002 [Thread-1 (_s] [        refresh_data()] [INFO ]  Checking data...
2023-06-05 07:54:30,006 [Thread-1 (_s] [            _refresh()] [INFO ]  Refreshing data...
2023-06-05 07:54:30,007 [Thread-1 (_s] [       get_something()] [INFO ]  Getting data...

I already did some research maybe this helps.

  1. wfastcgi github question so i thought because im writing the logs to a file in the webserver folder the server gets restarted, so i wrote logs outside the folder but the server kept restarting ( i also tried to edit the web.config but nothing worked for me )
  2. Microsoft dev network question a similar question i found

Can anyone explain this behavior to me? I would appriciate it if there are any suggestions how to handle a timed api call or in other words queue.

EDIT:

I found out that the IIS has a load balancing feature, which can load a website ( or web app ) on demand or let the website always running.

Here is what i found IIS - "Always On" Application Pool

But the features has no impact on the wFastCGI, the application is still restarting.

答案1

得分: 0

为什么要使用Web服务器的内存来存储数据?这种模式不被推荐,因为Web服务器可能会重新启动,导致数据丢失。

为什么不使用像Redis或数据库这样的缓存,并设置API来获取数据并返回给客户端?

为确保缓存/数据库层始终具有最新数据,您可以设置一个单独的定期运行的cron作业,从外部API获取数据并填充您的数据存储。

这将避免使用Web服务器作为数据存储的问题。


现在,如果您仍然希望继续这种方式,请确保您的Flask应用程序没有在调试模式下进行热加载/监视文件更改。确保使用生产配置。

另一个需要探讨的领域是IIS/FastCGI设置。您是否在根目录或任何目录中监视文件更改?这可能是重新启动的原因。

英文:

Why do you want to use the web server's memory to hold data? This pattern is not recommended, as web servers may restart, and data can be lost.

Why not use a cache like Redis or a Database and set the API to pull data/return it to the client?

To ensure the cache/database layer has the latest data, you can set a separate cron job that regularly runs to fetch the data from the external API and populate your data store.

This will avoid the headache of using your web server as a data store.


Now, if you still want to go down this way, ensure your Flask app is not hot-loading/watching for file changes that may happen if you are running on Debug mode. Ensure you use the Production configuration.

Another area to explore is the IIS/FastCGI settings. Are you watching for file changes in the root directory or any? This could be the cause of the restarts.

答案2

得分: 0

以下是翻译好的部分:

"在尝试了各种方法并得到了一些建议后,我实现了对Web应用程序的缓存/文件导出,自那以后它运行得很好。

我已经对我的api请求使用了一个session,因此,我只是从普通的session改为了requests_cachecached session

这里是我所做的一个例子:

from requests_cache import CachedSession

class Api:
    def __init__(self):
        self.API_KEY = 'xxx-xxx'
        self.BASE_URL = 'https://xxxxxxxx.com/3'
        self.HEADER = {
            'X-Api-Key': f'{self.API_KEY}',
            'Accept': 'application/json'
        }

        # 会话缓存设置(数据过期时)
        self.default_expire_after = 900
        self.urls_expire_after = {
            f'{self.BASE_URL}/endpoint1/': 900,
            f'{self.BASE_URL}/endpoint2/': 1800,
            f'{self.BASE_URL}/endpoint1': 3600
        }

        # 创建一个在根目录中以sqlite格式创建缓存文件的会话
        self.session = CachedSession('cache',
                                     backend='sqlite',
                                     expire_after=self.default_expire_after,
                                     urls_expire_after=self.urls_expire_after)
        self.session.headers.update(self.HEADER)

所有数据的API请求都会存储在cache中,当数据过期时,session会发送一个新的API请求。如果数据没有过期,它会从缓存中获取一切。

这有两个主要的改进:

  • 通过减少发送的API请求来减轻API服务器的负载
  • 由于会话从缓存中获取数据而无需等待API响应,因此响应时间更快。"
英文:

After various attempts and recommendations to use some kind of cache/file export i implemented caching to the webapp and since then it works great.

I already used a session for my api requests and therefore, i simply changed from a normal session to a cached session from requests_cache

Here is an example what i did:

from requests_cache import CachedSession

class Api:
    def __init__(self):
        self.API_KEY = 'xxx-xxx'
        self.BASE_URL = 'https://xxxxxxxx.com/3'
        self.HEADER = {
            'X-Api-Key': f'{self.API_KEY}',
            'Accept': 'application/json'
        }

        # Session cache setup ( when data expires )
        self.default_expire_after = 900
        self.urls_expire_after = {
            f'{self.BASE_URL}/endpoint1/': 900,
            f'{self.BASE_URL}/endpoint2/': 1800,
            f'{self.BASE_URL}/endpoint1': 3600
        }


        # Session that creates a cache file in the root dir in sqlite format
        self.session = CachedSession('cache',
                                     backend='sqlite',
                                     expire_after=self.default_expire_after,
                                     urls_expire_after=self.urls_expire_after)
        self.session.headers.update(self.HEADER)

The API requests with all the data are getting stored in the cache and when the data expires the session sends out a new api request. If the data isnt expired it takes everything from the cache.

This has two major improvements:

  • reduce load on the api servers by sending less api requests
  • faster response times, since the session fetches the data from the cache and does not need to wait for the api to respond

huangapple
  • 本文由 发表于 2023年6月8日 14:31:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76429150.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定