使用正则表达式从字符串中提取密码

huangapple go评论73阅读模式
英文:

Extracting a password from a String using regular expressions

问题

我正在尝试解决一个练习,在这个练习中我需要从给定的文本中提取密码并打印出来。规则如下:

> 密码由数字和拉丁字母的大写和小写组成;
密码总是紧跟在单词“password”之后(它可以是大写或小写字母书写),但是可以用任意数量的空格和冒号:字符与之分隔。

我的问题是,我需要确保密码之前有单词“password”、随机数量的空格和一个冒号,并且我只需要打印出密码。

例如,如果输入是:

    My email javacoder@gmail.com with password     SECRET115. Here is my old PASSWORD: PASS111.

输出应为:

    SECRET115
    
    PASS111

我遇到了前瞻和后顾的问题,并在正则表达式中尝试了它们:

```java
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Main {

    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        String text = scanner.nextLine();
        Pattern pattern = Pattern.compile("(?<=password[\\s:]*)\\w*(?=\\W)", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(text);
        if (!matcher.find()) {
            System.out.println("No passwords found.");
        }
        while (matcher.find()) {
            System.out.println(matcher.group());
        }  
    }
}

这个解决方案确实只打印了密码,但是它也不知名地打印了额外的换行。以上输入的输出看起来是这样的:

SECRET115



PASS111

另外,当我尝试将正则表达式更改为"(?<=password[\\s:]*)\\w{5,}(?=\\W)"以便只接受长度至少为5的密码时,程序只输出:

PASS111

另一个密码显然超过5个字符,为什么它被忽略了?


<details>
<summary>英文:</summary>

I am trying to solve an exercise in which I have to print the password in a given text. The rules are:

&gt; a password consists of digits and Latin upper- and lowercase letters;
a password always follows the word &quot;password&quot; (it can be written in upper- or lowercase letters), but can be separated from it by any number of whitespaces and colon : characters.

My problem is that I need to make sure that the password is preceded by &quot;password&quot; and a random number of whitespaces and a colon, but I also must print the password only. 

For example, if the input is:

    My email javacoder@gmail.com with password     SECRET115. Here is my old PASSWORD: PASS111.

The output should be:

    SECRET115
    
    PASS111

I stumbled upon lookaheads and lookbehinds and tried them in my regex:

   

    import java.util.Scanner;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    class Main {
    
        public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        String text = scanner.nextLine();
        Pattern pattern = Pattern.compile(&quot;(?&lt;=password[\\s:]*)\\w*(?=\\W)&quot;, Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(text);
        if (!matcher.find()) {
            System.out.println(&quot;No passwords found.&quot;);
        }
        while (matcher.find()) {
            System.out.println(matcher.group());
        }  
        }
    }
This solution does print passwords only, but it also prints extra newlines for no reason. The output for the above input looked like this:

    SECRET115


    
    PASS111

Also, when I try to change the regex to `&quot;(?&lt;=password[\\s:]*)\\w{5,}(?=\\W)&quot;` so that the password accepted be at least of length 5, the program outputs just:

    PASS111

The other password is clearly longer than 5 characters, why was it left out?






</details>


# 答案1
**得分**: 2

以下是翻译好的内容:

您可以使用正则表达式 `(?<=password|password:)\\s*(\\p{Alnum}+)`,这不仅易于理解,还能准确地给您提供结果。

```java
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(?<=password|password:)\\s*(\\p{Alnum}+)", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern
                .matcher("我的电子邮件是javacoder@gmail.com,带有密码SECRET115。这是我的旧密码:PASS111。");

        while (matcher.find()) {
            System.out.println(matcher.group(1));
        }
    }
}

输出:

SECRET115



PASS111

正则表达式解释:

  1. \p{Alnum} 匹配字母数字字符。查阅此链接以了解更多信息。请注意,对于您的要求,不应该使用 \w,因为它除了字母和数字之外,还会匹配下划线(_)。
  2. 正则表达式使用正向后顾断言来断言 \\s*(\\p{Alnum}+) 应该紧跟在 (?<=password|password:) 之后,即 passwordpassword:
  3. 所需结果来自于指定为 (\\p{Alnum}+)group(1)

如果您不喜欢使用 \p{Alnum},您可以使用 [A-Za-z0-9] 来替代。

英文:

You can use the regex, (?&lt;=password|password:)\\s*(\\p{Alnum}+) which is not only simple to understand but also gives you the result precisely.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
	public static void main(String[] args) {
		Pattern pattern = Pattern.compile(&quot;(?&lt;=password|password:)\\s*(\\p{Alnum}+)&quot;, Pattern.CASE_INSENSITIVE);
		Matcher matcher = pattern
				.matcher(&quot;My email javacoder@gmail.com with password     SECRET115. Here is my old PASSWORD: PASS111.&quot;);

		while (matcher.find()) {
			System.out.println(matcher.group(1));
		}
	}
}

Output:

SECRET115



PASS111

Explanation of the regex:

  1. \p{Alnum} matches an alphanumeric character. Check this to learn more about it. Note that you should not use \w for your requirement because it also matches underscore(_) apart from alphabets and digits.
  2. The regex uses the positive lookbehind to assert that \\s*(\\p{Alnum}+) should be followed by (?&lt;=password|password:) i.e. password or password:
  3. The desired result comes from group(1) which is specified as (\\p{Alnum}+)

If you are not comfortable with \p{Alnum}, you can use [A-Za-z0-9] instead.

答案2

得分: 2

当您的密码正则表达式包含\w*时,正则表达式首先匹配空字符串。由于在if条件中调用了matcher.find(),此匹配不会显示出来。当您使用\w{5}时,第一个匹配是SECRET115,并且不会显示出来。

使用

\bpassword[\s:]*(\w+)

查看证明

解释

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \b                       单词字符 (\w) 与非单词字符之间的边界
--------------------------------------------------------------------------------
  password                 'password'
--------------------------------------------------------------------------------
  [\s:]*                   任意字符:空白字符 (\n、\r、\t、\f 和 ' '),':'(出现次数为0次或多次,尽可能匹配最多的次数)
--------------------------------------------------------------------------------
  (                        分组并捕获到 :
--------------------------------------------------------------------------------
    \w+                      单词字符(a-z、A-Z、0-9、_)(出现次数为1次或多次,尽可能匹配最多的次数)
--------------------------------------------------------------------------------
  )                        结束 

Java 代码

Scanner scanner = new Scanner(System.in);
String text = scanner.nextLine();
Pattern pattern = Pattern.compile("\\bpassword[\\s:]*(\\w+)", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
Boolean found = false;
while (matcher.find()) {
    System.out.println(matcher.group(1));
    found = true;
}
if (!found) {
    System.out.println("No passwords found.");
}

输出:

SECRET115



PASS111
英文:

When your password regex contains \w* the regex matches an empty string first. This match is not shown because you call matcher.find() in the if condition. When you use \w{5}, the first match is the SECRET115, and it is not displayed.

Use

\bpassword[\s:]*(\w+)

See proof

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  password                 &#39;password&#39;
--------------------------------------------------------------------------------
  [\s:]*                   any character of: whitespace (\n, \r, \t,
                           \f, and &quot; &quot;), &#39;:&#39; (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of 

Java code:

Scanner scanner = new Scanner(System.in);
String text = scanner.nextLine();
Pattern pattern = Pattern.compile(&quot;\\bpassword[\\s:]*(\\w+)&quot;, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
Boolean found = false;
while (matcher.find()) {
    System.out.println(matcher.group(1));
    found = true;
}
if (!found) {
    System.out.println(&quot;No passwords found.&quot;);
}

Output:

SECRET115



PASS111

答案3

得分: 0

第一个匹配是因为在if条件中的matcher.find()调用而被吃掉的。只有从第二个匹配开始才会产生输出。

英文:

The first match is eaten because of the matcher.find() call in the if condition. Only starting from the second match you produce output.

答案4

得分: 0

尝试匹配 (?i)(password[\s:]*)(\w+) 并提取每个匹配中的第二组。在Java中,不支持可变长度的回顾后断言。

英文:

Try (?i)(password[\s:]*)(\w+)
and extract second group in every match
In java variable length lookbehind are not possible...

答案5

得分: 0

private static final Pattern PASSWORD_PATTERN =
    Pattern.compile("password\\s*:?\\s*(?<password>[A-Za-z0-9]+)", Pattern.CASE_INSENSITIVE);

public static List<String> getAllPasswords(String str) {
    Matcher matcher = PASSWORD_PATTERN.matcher(str);
    List<String> passwords = new ArrayList<>();

    while (matcher.find()) {
        passwords.add(matcher.group("password"));
    }

    return passwords;
}

Demo you can find at regex101.com

英文:
private static final Pattern PASSWORD_PATTERN =
 Pattern.compile(&quot;password\\s*:?\\s*(?&lt;password&gt;[A-Za-z0-9]+)&quot;, Pattern.CASE_INSENSITIVE);

public static List&lt;String&gt; getAllPasswords(String str) {
    Matcher matcher = PASSWORD_PATTERN.matcher(str);
    List&lt;String&gt; passwords = new ArrayList&lt;&gt;();

    while (matcher.find()) {
        passwords.add(matcher.group(&quot;password&quot;));
    }

    return passwords;
}

Demo you can find at regex101.com

huangapple
  • 本文由 发表于 2020年8月17日 02:33:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/63440612.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定