PHP统计并从文本文件中分离文本。

huangapple go评论72阅读模式
英文:

PHP count and seperate text from text file

问题

在给定的文本文件中,您想要统计每个用户说了多少个单词。以下是您的代码的翻译部分:

$usercount1 = 0;
$usercount2 = 0;

// 以只读模式打开文件
$file = fopen("logfile.txt", "r");

// 逐行读取文件内容
while (($line = fgets($file)) !== false) {
    // 拆分每一行成单词
    $words = explode(" ", $line);

    // 如果行包含"<Amanda>",则计算Amanda说的单词数
    if (strpos($line, "<Amanda>") !== false) {
        $usercount1 = $usercount1 + count($words);
    }

    // 如果行包含"<Jack>",则计算Jack说的单词数
    if (strpos($line, "<Jack>") !== false) {
        $usercount2 = $usercount2 + count($words);
    }
}

// 输出Amanda和Jack说的单词数
echo "Amanda: " . $usercount1 . "\n";
echo "Jack: " . $usercount2 . "\n";

// 关闭文件
fclose($file);

这段代码会统计每个用户说的单词数,并将结果存储在$usercount1$usercount2变量中。然后,它会输出Amanda和Jack说的单词数。

英文:

Let's say inside the text file is this information:

&lt;Amanda&gt; Hi there, how are you?
&lt;Jack&gt; Hi, im fine 
.
.
.
.
&lt;Jack&gt; see you later

I want to count the words each user have said the output should be for example like this

Amanda: 50
Jack: 40

First I want to not count the &lt;Amanda&gt; or &lt;Jack&gt; and next I want to count every words they said and insert it to variables Amanda and Jack

This is what I have done

    $usercount1 = 0;
    $usercount2 = 0;  

    //Opens a file in read mode  
    $file = fopen(&quot;logfile.txt&quot;, &quot;r&quot;);  
    //Gets each line till end of file is reached  
    while (($line = fgets($file)) !== false) {  
        //Splits each line into words
        $words = explode(&quot; &quot;, $line);  
        $words = explode(&quot;&lt;Amanda&gt;&quot;, $line);  
        //Counts each word  
        $usercount1 = $usercount1 + count($words);  
    }

    while (($line = fgets($file)) !== false) {  
        //Splits each line into words  
        $words = explode(&quot; &quot;, $line);
        //Counts each word  
        $usercount2 = $usercount2 + count($words);  
    } 

答案1

得分: 2

以下是您提供的代码的翻译:

根据我的理解,这可能是一个可能的解决方案。

// 输入
$input = "<Amanda> 你好,你好吗?
<Jack> 嗨,我很好。
<Jack> 再见。";

// 初始化计数器
$amandaCount = 0;
$jackCount = 0;

// 按行分割输入
$lines = explode("\n", $input);

// 遍历每一行
foreach ($lines as $line) {
  // 删除用户标签
  $cleanLine = preg_replace("/<.+?>/", "", $line);
  
  // 将行分割成单词
  $words = str_word_count($cleanLine, 1);
  
  // 统计每个用户的单词数
  if (strpos($line, "<Amanda>") !== false) {
    $amandaCount += count($words);
  } elseif (strpos($line, "<Jack>") !== false) {
    $jackCount += count($words);
  }
}

// 输出
echo "Amanda: $amandaCount\n";
echo "Jack: $jackCount\n";
英文:

As per my understanding this could be a possible solution.


// Input
$input = &quot;&lt;Amanda&gt; Hi there, how are you?\n&lt;Jack&gt; Hi, im fine \n &lt;Jack&gt; see you later&quot;;

// Initialize counters
$amandaCount = 0;
$jackCount = 0;

// Split input by lines
$lines = explode(&quot;\n&quot;, $input);

// Loop over lines
foreach ($lines as $line) {
  // Remove user tags
  $cleanLine = preg_replace(&quot;/&lt;.+?&gt;/&quot;, &quot;&quot;, $line);
  
  // Split line into words
  $words = str_word_count($cleanLine, 1);
  
  // Count words per user
  if (strpos($line, &quot;&lt;Amanda&gt;&quot;) !== false) {
    $amandaCount += count($words);
  } elseif (strpos($line, &quot;&lt;Jack&gt;&quot;) !== false) {
    $jackCount += count($words);
  }
}

// Output
echo &quot;Amanda: $amandaCount\n&quot;;
echo &quot;Jack: $jackCount\n&quot;;

答案2

得分: 1

我会采用更通用的方法。这样,你可以分析所有用户。使用黑名单,只需将它们排除。

  • 首先,遍历所有行,匹配用户名和文本。
  • 通过迭代和使用黑名单进行计数来重建数据结构。

黑名单的格式如下,因为查找键比查找值更快。

$input = &lt;&lt;&lt;&#39;_TEXT&#39;
&lt;Amanda&gt; Hi there, how are you?
&lt;Jack&gt; Hi, im fine
&lt;Jack&gt; see you later
&lt;John&gt; Hello World, my friends!
&lt;Daniel&gt; Foo!
_TEXT;
preg_match_all(&#39;/^&lt;([^&gt;]+)&gt;(.*?)$/m&#39;, $input, $matches);

$blacklist = [&#39;Amanda&#39; =&gt; 1, &#39;Jack&#39; =&gt; 1];
$words = [];
foreach ($matches[2] as $index =&gt; $match) {
    $user = $matches[1][$index];
    if (isset($blacklist[$user])) {
        continue;
    }
    $words[$user] = ($words[$user] ?? 0) + str_word_count($match);
}
print_r($words);
Array
(
    [John] =&gt; 4
    [Daniel] =&gt; 1
)
英文:

I would go a more general approach. This way you can analyze all users. Using a blacklist, just exclude them.

  • First go through all the lines and match for username and text.
  • Rebuild data structure by iterating and counting up using a blacklist.

The blacklist is formatted like this, because finding keys is faster than finding values.

$input = &lt;&lt;&lt;&#39;_TEXT&#39;
&lt;Amanda&gt; Hi there, how are you?
&lt;Jack&gt; Hi, im fine
&lt;Jack&gt; see you later
&lt;John&gt; Hello World, my friends!
&lt;Daniel&gt; Foo!
_TEXT;
preg_match_all(&#39;/^&lt;([^&gt;]+)&gt;(.*?)$/m&#39;, $input, $matches);

$blacklist = [&#39;Amanda&#39; =&gt; 1, &#39;Jack&#39; =&gt; 1];
$words = [];
foreach ($matches[2] as $index =&gt; $match) {
    $user = $matches[1][$index];
    if (isset($blacklist[$user])) {
        continue;
    }
    $words[$user] = ($words[$user] ?? 0) + str_word_count($match);
}
print_r($words);
Array
(
    [John] =&gt; 4
    [Daniel] =&gt; 1
)

答案3

得分: 1

I would implement the blacklisted names in the regex to filter them out as early as possible.

在正则表达式中实现黑名单的名字以尽早将它们排除。

A negated lookahead ensures that Amanda and Jack are excluded. (?!Amanda&gt;|Jack&gt;)

否定前瞻确保了排除了Amanda和Jack。(?!Amanda&gt;|Jack&gt;)

The m pattern modifier changes the meaning of the ^ ("start of string" anchor) to be the "start of a line" anchor.

模式修饰符m改变了^("字符串开头"锚点)的意义,使其成为"行的开头"锚点。

Parentheses around the name subpattern will create capture group 1 (accessible as element [1]). \K will restart the fullstring match, so the space-delimited words substring will be accessible via [0].

在名称子模式周围的括号将创建捕获组1(可通过元素[1]访问)。\K会重新开始完整字符串的匹配,因此以空格分隔的单词子字符串可以通过[0]访问。

Use destructuring syntax in the foreach() for convenient variables.

foreach()中使用解构语法以获取便捷的变量。

Code: (Demo)

代码:(示例

preg_match_all(
    '/^&lt;((?!Amanda&gt;|Jack&gt;)[^&gt;]+)&gt; \K.+/m',
    $chat,
    $matches,
    PREG_SET_ORDER
);
$result = [];
foreach ($matches as [$words, $name]) {
    $result[$name] = ($result[$name] ?? 0) + str_word_count($words);
}
var_export($result);
英文:

I would implement the blacklisted names in the regex to filter them out as early as possible.

A negated lookahead ensures that Amanda and Jack are excluded. (?!Amanda&gt;|Jack&gt;)

The m pattern modifier changes the meaning of the ^ ("start of string" anchor) to be the "start of a line" anchor.

Parentheses around the name subpattern will create capture group 1 (accessible as element [1]). \K will restart the fullstring match, so the space-delimited words substring will be accessible via [0].

Use destructuring syntax in the foreach() for convenient variables.

Code: (Demo)

preg_match_all(
    &#39;/^&lt;((?!Amanda&gt;|Jack&gt;)[^&gt;]+)&gt; \K.+/m&#39;,
    $chat,
    $matches,
    PREG_SET_ORDER
);
$result = [];
foreach ($matches as [$words, $name]) {
    $result[$name] = ($result[$name] ?? 0) + str_word_count($words);
}
var_export($result);

huangapple
  • 本文由 发表于 2023年4月17日 16:19:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76033054.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定