Python执行日志

huangapple go评论59阅读模式
英文:

Python execution log

问题

我想要创建一个用于记录Python脚本执行的日志。例如:

import pandas as pd
data = pd.read_excel('example.xlsx')
data.head()

我如何创建一个日志来知道谁运行了这个脚本,执行时间以及完成时间。此外,例如,假设我从数据框中取一个样本,如何创建一个种子,以便与其他人共享,以执行并获得相同的结果?

英文:

I'd like to create a log for a Python script execution. For example:

import pandas as pd
data = pd.read_excel('example.xlsx')
data.head()

How can I create a log for this script un order to know who run the script, when was executed, when did it finish. And ir for example, suppossing I take a sample of the df, how can I make to create a seed so I can share it to another person to execute it and have the same result?

答案1

得分: 2

你可以使用Python默认自带的logging模块。需要添加一些额外的代码来配置它以记录所需的信息(执行时间和执行脚本的用户),并指定日志消息的存储文件名。

至于添加“谁”运行脚本的信息,这将取决于您如何区分用户。如果您的脚本旨在在某个服务器上执行,您可以通过其IP地址区分用户。另一种解决方案是使用getpass模块,就像我在下面的示例中所做的那样。

最后,在从data生成样本时,您可以将整数设置为参数random_state的种子,以使样本始终包含相同的行。

以下是已经修改过的脚本,包含先前提到的更改:

# == 必要的导入 =========================================================
import logging
import pandas as pd
import getpass


# == 脚本配置 ==========================================================
# 设置种子以实现可重现性
SEED = 1

# 获取运行脚本的用户的用户名。
USERNAME = getpass.getuser()

# 设置日志格式。
LOG_FORMAT = '[%(levelname)s | ' + USERNAME + ' | %(asctime)s] - %(message)s';

# 存储日志的文件名。
LOG_FILENAME = 'script_execution.log';

# 要记录消息的级别。默认情况下,日志具有以下级别,按严重程度排名:
# 1. DEBUG:详细信息,仅在诊断问题时有用。
# 2. INFO:确认一切正常工作的消息。
# 3. WARNING:需要用户注意的信息。
# 4. ERROR:发生错误,脚本无法执行某些功能。
# 5. CRITICAL:发生严重错误,脚本可能无法正常运行。
LOG_LEVEL = logging.INFO
# 设置级别时,所有更严重级别的消息也将被记录。例如,当您将日志级别设置为“INFO”时,所有“WARNING”,“ERROR”和“CRITICAL”消息也将被记录,但不会记录“DEBUG”消息。


# == 设置日志 ========================================================
logging.basicConfig(
    level=LOG_LEVEL,
    format=LOG_FORMAT,
    force=True,
    datefmt="%Y-%m-%d %H:%M:%S",
    handlers=[logging.FileHandler(LOG_FILENAME, "a", "utf-8"),
              logging.StreamHandler()]
)


# == 脚本开始 ==========================================================
# 记录脚本执行开始
logging.info('脚本开始执行!')

# 从Excel文件中读取数据
data = pd.read_excel('example.xlsx')

# 从`data`中获取包含50%行的样本。
# 当设置了`random_state`时,`pd.DataFrame.sample`将始终返回相同的数据框,前提是`data`没有更改。
sample_data = data.sample(frac=0.5, random_state=SEED)

# 其他操作
# ...

# 记录脚本执行完成时
logging.info('脚本执行完成!')

运行上述代码会在控制台打印以下消息:

[INFO | erikingwersen | 2023-02-13 23:17:14] - 脚本开始执行!
[INFO | erikingwersen | 2023-02-13 23:17:14] - 脚本执行完成!

它还会创建或更新一个名为'script_execution.log'的文件,位于与脚本相同的目录中,其中包含与打印到控制台相同的信息。

英文:

You could use the logging module that comes by default with Python.
You'll have to add a few extra lines of code to configure it to log the information you require (time of execution and user executing the script) and specify a file name where the log messages should be stored at.

In respect to adding the information of "who" ran the script, it will depend on how you want to differentiate users. If your script is intended to be executed on some server, you might want to differentiate users by their IP addresses. Another solution is to use the getpass module, like I did in the example below.

Finally, when generating a sample from data, you can set an integer as seed to the parameter random_state to make the sample always contain the same rows.

Here's a modified version of your script with the previously mentioned changes:

# == Necessary Imports =========================================================
import logging
import pandas as pd
import getpass


# == Script Configuration ======================================================
# Set a seed to enable reproducibility
SEED = 1

# Get the username of the person who is running the script.
USERNAME = getpass.getuser()

# Set a format to the logs.
LOG_FORMAT = '[%(levelname)s | ' + USERNAME + ' | %(asctime)s] - %(message)s'

# Name of the file to store the logs.
LOG_FILENAME = 'script_execution.log'

# Level in which messages are to be logged. Logging, by default has the
# following levels, ordered by ranking of severity:
# 1. DEBUG: detailed information, useful only when diagnosing a problem.
# 2. INFO: message that confirms that everything is working as it should.
# 3. WARNING: message with information that requires user attention
# 4. ERROR: an error has occurred and script is unable to perform some function.
# 5. CRITICAL: serious error occurred and script may stop running properly.
LOG_LEVEL = logging.INFO
# When you set the level, all messages from a higher level of severity are also
# logged. For example, when you set the log level to `INFO`, all `WARNING`,
# `ERROR` and `CRITICAL` messages are also logged, but `DEBUG` messages are not.


# == Set up logging ============================================================
logging.basicConfig(
    level=LOG_LEVEL,
    format=LOG_FORMAT,
    force=True,
    datefmt="%Y-%m-%d %H:%M:%S",
    handlers=[logging.FileHandler(LOG_FILENAME, "a", "utf-8"),
              logging.StreamHandler()]
)


# == Script Start ==============================================================
# Log the script execution start
logging.info('Script started execution!')

# Read data from the Excel file
data = pd.read_excel('example.xlsx')

# Retrieve a sample with 50% of the rows from `data`.
# When a `random_state` is set, `pd.DataFrame.sample` will always return
# the same dataframe, given that `data` doesn't change.
sample_data = data.sample(frac=0.5, random_state=SEED)

# Other stuff
# ...

# Log when the script finishes execution
logging.info('Script finished execution!')


Running the above code prints to the console the following messages:

[INFO | erikingwersen | 2023-02-13 23:17:14] - Script started execution!
[INFO | erikingwersen | 2023-02-13 23:17:14] - Script finished execution!

It also creates or updates a file named 'script_execution.log', located at the same directory as the script with the same information that gets printed to the console.

答案2

得分: 1

  1. 创建日志

您可以使用Python的标准日志模块。

Logging HOWTO — Python 3.11.2 documentation

import logging
logging.basicConfig(filename='example.log', encoding='utf-8', level=logging.DEBUG)
logging.debug('这条消息应该记录到日志文件中')
logging.info('这也应该记录')
logging.warning('还有这个')
logging.error('还有非ASCII字符,比如 Øresund 和 Malmö')

1.1 知道谁运行了脚本

import getpass
getpass.getuser()

1.2 知道运行的时间

FORMAT = '%(asctime)s %(clientip)-15s %(user)-8s %(message)s'
logging.basicConfig(format=FORMAT)
d = {'clientip': '192.168.0.1', 'user': 'fbloggs'}
logger = logging.getLogger('tcpserver')
logger.warning('协议问题:%s', '连接重置', extra=d)
  1. 创建一个种子以便与其他人共享并执行以获得相同的结果

您可以使用参数 random_state

df['one_col'].sample(n=10, random_state=1)
英文:
  1. To create a log

You could use python's standard logging moudle.

Logging HOWTO — Python 3.11.2 documentation

import logging
logging.basicConfig(filename='example.log', encoding='utf-8', level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
logging.error('And non-ASCII stuff, too, like Øresund and Malmö')

1.1 To know who ran the script

import getpass
getpass.getuser()

1.2 To know when it ran

FORMAT = '%(asctime)s %(clientip)-15s %(user)-8s %(message)s'
logging.basicConfig(format=FORMAT)
d = {'clientip': '192.168.0.1', 'user': 'fbloggs'}
logger = logging.getLogger('tcpserver')
logger.warning('Protocol problem: %s', 'connection reset', extra=d)
  1. Create a seed so you can share it with another person to execute it and have the same result

You can use a parameter random_state

df['one_col'].sample(n=10, random_state=1)

huangapple
  • 本文由 发表于 2023年2月14日 09:10:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/75442598.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定