PHP:在同时提取一些统计信息的同时验证“文本”文件的行吗?

huangapple go评论62阅读模式
英文:

PHP: validate the lines of a "text" file while extracting some stats at the same time?

问题

以下是您提供的代码的中文翻译部分:

我有一个文件(来自POST请求),我想要根据一些约束进行验证:

  • 所有行必须仅由ASCII可打印字符组成。
  • 必须至少有一个XYZ记录(以@XYZ 开头的行)。
  • 最多可以有999999个XYZ记录。

为此,我创建了一个通用函数,它按块读取文件并将每行传递给回调函数进行验证:

/*
 * 遍历文件的每一行,将它们传递给回调函数进行验证。
 * 当回调函数返回false或出现错误时,验证过程结束。
 * 
 * @param string   $filename       要验证的文件的名称。
 * @param callable $callback       用于验证每一行的回调函数。
 * @param string   $line_delimiter 行结束分隔符(默认为"\n")。
 * @param integer  $buffer_size    一次从文件中读取的最大字节数(默认为8192)。
 *
 * @return 当$callback对每一行都返回true时返回true,否则返回false,出错时返回null。
 *
 * @warning 当$buffer_size不足以包含整行时,$callback将验证行的块。
 */
function validate_file_lines($filename, $callback, $line_delimiter = "\n", $buffer_size = 8192)
{
    $handle = fopen($filename, 'rb');
    $is_valid = (false === $handle ? null : true);

    $remainder = '';

    while ($is_valid && !feof($handle))
    {
        $buffer = fread($handle, $buffer_size);

        if (false === $buffer)
        {
            $is_valid = null;
        }
        else
        {
            $lines_array = explode($line_delimiter, $buffer);
            $lines_array_key_last = count($lines_array) - 1;

            $lines_array[0] = $remainder . $lines_array[0];

            if ($lines_array_key_last !== 0)
            {
                $remainder = $lines_array[$lines_array_key_last];
                unset($lines_array[$lines_array_key_last]);
            }

            foreach ($lines_array as $line)
            {
                $is_valid = $callback($line);
                if (!$is_valid)
                    break;
            }
        }
    }
    @fclose($handle);
    return $is_valid;
}

现在,我正在尝试使用它来验证一个文件,例如:

HEAD good
@XYZ 1
@XYZ 1
%END

HEAD better
@XYZ 2 2
%END
$xyz_count = 0;
$xyz_min = 1;
$xyz_max = 999999;

$is_valid_line = function ($line) use (&$xyz_count, $xyz_max) {
    $is_valid = true;
    if (ctype_print($line))
    {
        if (substr($line, 0, 6) === '@XYZ ')
        {
            ++$xyz_count;
            $is_valid = $xyz_count <= $xyz_max;
        }
    }
    else if ('' !== @$line[0])
    {
        $is_valid = false;
    }
    return $is_valid;
};

var_dump(
    validate_file_lines('file.txt', $is_valid_line) && $xyz_count >= $xyz_min
);

当前输出为:

bool(false)

而我期望的是:

bool(true)

我做错了什么?


顺便问一下

SPL是否提供用于遍历文件行的任何类?

英文:

I have a file (from a POST request) that I would like to validate against some constraints:

  • All lines must be composed of ASCII printable characters only.
  • There must be at least one XYZ record (lines that start with @XYZ ).
  • There must be at most 999999 XYZ records

For that purpose I made a generic function that reads a file by chunks and pass each line to a callback for validation:

/*
 * Iterates over each line of the file, passing them to the callback function for validation.
 * When the callback function returns false, or when there is an error,
 * the validation process ends.
 * 
 * @param string   $filename       The name of the file to validate.
 * @param callable $callback       The callback function to use for validating each line.
 * @param string   $line_delimiter The line-ending delimiter (default is &quot;\n&quot;).
 * @param integer  $buffer_size    The maximum number of bytes to read from the file at a time (default is 8192).
 *
 * @return Returns true when $callback returned true for each line, false if not, and NULL on error.
 *
 * @warning When $buffer_size is not large enough to contain a whole line, $callback will validate chunks of lines.
 */
function validate_file_lines($filename, $callback, $line_delimiter = &quot;\n&quot;, $buffer_size = 8192)
{
    $handle = fopen($filename, &#39;rb&#39;);
    $is_valid = (false === $handle ? null : true);

    $remainder = &#39;&#39;;

    while ( $is_valid &amp;&amp; !feof($handle) )
    {
        $buffer = fread($handle, $buffer_size);

        if ( false === $buffer )
        {
            $is_valid = null;
        }
        else
        {
            $lines_array = explode($line_delimiter, $buffer);
            $lines_array_key_last = count($lines_array) - 1;

            $lines_array[0] = $remainder . $lines_array[0];

            if ( $lines_array_key_last !== 0 )
            {
                $remainder = $lines_array[$lines_array_key_last];
                unset($lines_array[$lines_array_key_last]);
            }

            foreach ( $lines_array as $line )
            {
                $is_valid = $callback($line);
                if ( ! $is_valid )
                    break;
            }
        }
    }
    @fclose($handle);
    return $is_valid;
}

Now, using it, I'm trying to validate a file, for example:

HEAD good
@XYZ 1
@XYZ 1
%END

HEAD better
@XYZ 2 2
%END
$xyz_count = 0;
$xyz_min = 1;
$xyz_max = 999999;

$is_valid_line = function($line) use(&amp;$xyz_count, $xyz_max) {
    $is_valid = true;
    if ( ctype_print($line) )
    {
        if ( substr($line, 0, 6) === &#39;@XYZ &#39; )
        {
            ++$xyz_count;
            $is_valid = $xyz_count &lt;= $xyz_max;
        }
    }
    else if ( &#39;&#39; !== @$line[0] )
    {
        $is_valid = false;
    }
    return $is_valid;
};

var_dump(
    validate_file_lines(&#39;file.txt&#39;, $is_valid_line) &amp;&amp; $xyz_count &gt;= $xyz_min
);

The current output is:

bool(false)

While I'm expecting:

bool(true)

What am I doing wrong?


ASIDE

Does the SPL provide any class for iterating over file lines?

答案1

得分: 1

你的 substr() 需要是 5 个字符,而不是 6 个。你可以使用 fgets() 按行读取。这是一个可能会起作用的简单解决方案。你的模式应该只是 r

此外,你可以添加调试打印来显示发生错误的位置。

<?php
$fh = fopen($filename, 'r');
$valid = true;
$xyz_count = 0;
while ($valid && $line = fgets($fh)){
    if (!ctype_print($line)) $valid = false;
    if (substr($line, 0, 5) == '@XYZ ') $xyz_count++;
    if ($xyz_count >= $xyz_max) $valid = false;

    // if (!$valid) echo "LINE (fail): {$line}";

}
if ($xyz_count === 0) $valid = false;
fclose($fh);
英文:

Your substr() needs to be 5 chars, not 6. You can use fgets() to read by line. Here's a barebones solution that might probably work. And your mode should just be r

Also, you might add debug printing to show where errors are happening.

&lt;?php
$fh = fopen($filename, &#39;r&#39;);
$valid = true;
$xyz_count = 0;
while ($valid &amp;&amp; $line = fgets($fh)){
    if (!ctype_print($line))$valid = false;
    if (substr($line, 0, 5) == &#39;@XYZ &#39;)$xyz_count++;
    if ($xyz_count &gt;= $xyz_max)$valid = false;

    // if (!$valid)echo &quot;LINE (fail): {$line}&quot;;

}
if ($xyz_count === 0)$valid = false;
fclose($fh);

huangapple
  • 本文由 发表于 2023年7月3日 03:13:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76600430.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定