String to occurences and back algorithm

huangapple go评论74阅读模式
英文:

String to occurences and back algorithm

问题

Here's the translated code:

所以我有这个字符串 `aabcd`,我需要计算出现的次数并进行反向操作

这是我第一种方法
```python
def string_to_occurances(s):
    dictionary = {}
    for i in range(int(len(s))):
        if s[i] in dictionary:
            dictionary
展开收缩
] += 1
else: dictionary
展开收缩
] = 1
return ''.join(f'{k}{v}' for k, v in dictionary.items())

所以,在这种情况下,aabcd 变成了 a2b1c1d1,现在我想知道如何将其转换回来。

这是我目前有的内容:

def is_digit(s):
    if s == '0' or s == '1' or s == '2' or s == '3' or s == '4' or s == '5' or s == '6' or s == '7' or s == '8' or s == '9':
        return True
    else:
        return False

s = 'a2b1c1d1'
last_chat = s[0]
index = len(s) - 1
digits = ''
for i in range(len(s) - 1, -1, -1):
    while index >= 0 and is_digit(s[index]):
        digits += s[index]
        index -= 1

有什么建议吗?


<details>
<summary>英文:</summary>

So I have this string `aabcd` and I need to count the number of occurrences and back.

This is the first method I have:
```python
def string_to_occurances(s):
    dictionary = {}
    for i in range(int(len(s))):
        if s[i] in dictionary:
        dictionary
展开收缩
] += 1 else: dictionary
展开收缩
] = 1 return &#39;&#39;.join(f&#39;{k}{v}&#39; for k, v in dictionary.items())

So, in this case aabcd becomes a2b1c1d1 and now I wonder how to convert it back.

This is what I have so far:

def is_digit(s):
    if s == &#39;0&#39; or s == &#39;1&#39; or s == &#39;2&#39; or s == &#39;3&#39; or s == &#39;4&#39; or s == &#39;5&#39; or s == &#39;6&#39; or s == &#39;7&#39; or s == &#39;8&#39; or s == &#39;9&#39;:
        return True
    else:
        return False

s = &#39;a2b1c1d1&#39;
last_chat = s[0]
index = len(s) - 1
digits = &#39;&#39;
for i in range(len(s) - 1, -1, -1):
    while index &gt;= 0 and is_digit(s[index]):
        digits += s[index]
        index -= 1

Any suggestions?

答案1

得分: 2

正则表达式版本

处理类似于 a10b2 的情况,建议使用正则表达式和 re.finditer() 来查找字符串中所有数字及其位置。

re.finditer() 将返回一个 Match对象 的迭代器,其中包含正则表达式 r'\d+' 在字符串 a10b2c1 上的匹配项,以及一个 span 方法,该方法返回一个包含匹配位置的元组。例如:

import re
s = 'a10b2c1'
matches = re.finditer(r'\d+', s)
for match in matches:
    text = match.group()
    print(match)
# <re.Match object; span=(1, 3), match='10'>
# <re.Match object; span=(4, 5), match='2'>
# <re.Match object; span=(6, 7), match='1'>

spanmatch 中,我们可以计算字符串及其出现次数。

import re

def occurances_to_string(s):
    matches = re.finditer(r'\d+', s) # 在字符串中查找所有数字及其位置
    result = ''
    for match in matches:
        text = match.group()
        span_start, span_end = match.span()
        for i in range(0, int(text)):
            result += s[span_start - 1:span_start]
    if not result: # 如果没有找到数字,处理类似 'abcd' 的情况
        result = s
    return result

print(occurances_to_string('a1b2c3')) # abbccc
print(occurances_to_string('a10b2c3')) # aaaaaaaaaabbccc
print(occurances_to_string('a3b1c1d1')) # aaabcd

非正则表达式版本

我将遍历字符串,然后检查字符串中的字符是否是数字,如果是,就检查下一个字符是否也是数字。重复此过程,直到下一个字符不是数字为止,然后将找到的所有数字转换为整数。

def occurances_to_string(s):
    result = ''
    i = 0
    while i < len(s):
        if s[i].isdigit():
            char = s[i-1]
            num = s[i]
            while i+1 < len(s) and s[i+1].isdigit():
                num += s[i+1]
                i += 1
            for j in range(int(num)):
                result += char
        i += 1
    if not result:
        result = s
    return result

print(occurances_to_string('a1b2c3')) # abbccc
print(occurances_to_string('a10b2c3')) # aaaaaaaaaabbccc
print(occurances_to_string('abcd')) # abcd
英文:

Regex version

To handle case like a10b2, I suggest to use regex with re.finditer() to findall number and position of them in the string.

re.finditer() will return an iterator of Match object, which contain matches such as 10, 2, 3 for regex r&#39;\d+&#39; on string a10b2c1 and a span method which returns a tuple contain the position of the match. For example:

import re
s = &#39;a10b2c1&#39;
matches = re.finditer(r&#39;\d+&#39;, s)
for match in matches:
    text = match.group()
    print(match)
# &lt;re.Match object; span=(1, 3), match=&#39;10&#39;&gt; 
# &lt;re.Match object; span=(4, 5), match=&#39;2&#39;&gt;
# &lt;re.Match object; span=(6, 7), match=&#39;1&#39;&gt;

From the span and match we can calculate the string and the occurances of them

import re

def occurances_to_string(s):
    matches = re.finditer(r&#39;\d+&#39;, s) # find all digits in the string and their position
    result = &#39;&#39;
    for match in matches:
        text = match.group()
        span_start, span_end = match.span()
        for i in range(0, int(text)):
            result += s[span_start - 1:span_start]
    if not result: # in case no number found &#39;abcd&#39;
        result = s
    return result

print(occurances_to_string(&#39;a1b2c3&#39;))
# abbccc
print(occurances_to_string(&#39;a10b2c3&#39;))
# aaaaaaaaaabbccc
print(occurances_to_string(&#39;a3b1c1d1&#39;))
# aaabcd

Non regex version

I'll iterate through the string, then check if a char in the string is digit, if it is, check if next char is digit too. Repeat the process until next char is not digit, then convert all digits found to int.

def occurances_to_string(s):
    result = &#39;&#39;
    i = 0
    while i &lt; len(s):
        if s[i].isdigit():
            char = s[i-1]
            num = s[i]
            while i+1 &lt; len(s) and s[i+1].isdigit():
                num += s[i+1]
                i += 1
            for j in range(int(num)):
                result += char
        i += 1
    if not result:
        result = s
    return result

print(occurances_to_string(&#39;a1b2c3&#39;)) #abbccc
print(occurances_to_string(&#39;a10b2c3&#39;)) #aaaaaaaaaabbccc
print(occurances_to_string(&#39;abcd&#39;)) #abcd
print(occurances_to_string(&#39;1a3b1c1d1&#39;)) #1aaabcd
print(occurances_to_string(&#39;a2b12c13&#39;)) #aabbbbbbbbbbbbccccccccccccc

答案2

得分: 1

以下是您要翻译的部分:

"You can't just use a dictionary where the keys are the characters in the string because that will only work for repeated values that are adjacent in the input string. Build a list instead.

You can still use a dictionary depending on the output you need

s = 'aabcda';

output = [

展开收缩
, 1]];
d = {s[0]: 1};

for c in s[1:]:
d[c] = d.get(c, 0) + 1
if c == output[-1][0]:
output[-1][1] += 1
else:
output.append([c, 1])
print(''.join([f'{k}{v}' for k, v in d.items()]))
print(''.join([f'{c}{n}' for c, n in output]))
print(''.join([f'{c*n}' for c, n in output]))

Output:

a3b1c1d1
a2b1c1d1a1
aabcda

Note:

It is impossible to convert 'a3b1c1d1' to 'aabcda'

If repeated characters are only ever adjacent in the input string then we can implement encode and decode functions simply as:

def encode(s):
d = {}
for c in s:
d[c] = d.get(c, 0) + 1
return ''.join(f'{k}{v}' for k, v in d.items())

def decode(s):
d = []
for c in s:
if c.isdecimal():
d[-1][1] = 10
d[-1][1] += int(c)
else:
d.append([c, 0])
return ''.join(f'{c
n}' for c, n in d)

str = 'aabcddzzzzzzzzzzyy'
encoded = encode(str)
print(encoded)
decoded = decode(encoded)
print(decoded)

assert str == decoded

Output:

a2b1c1d2z10y2
aabcddzzzzzzzzzzyy"

英文:

You can't just use a dictionary where the keys are the characters in the string because that will only work for repeated values that are adjacent in the input string. Build a list instead.

You can still use a dictionary depending on the output you need

s = &#39;aabcda&#39;

output = [
展开收缩
, 1]] d = {s[0]: 1} for c in s[1:]: d[c] = d.get(c, 0) + 1 if c == output[-1][0]: output[-1][1] += 1 else: output.append([c, 1]) print(&#39;&#39;.join([f&#39;{k}{v}&#39; for k, v in d.items()])) print(&#39;&#39;.join([f&#39;{c}{n}&#39; for c, n in output])) print(&#39;&#39;.join([f&#39;{c*n}&#39; for c, n in output]))

Output:

a3b1c1d1
a2b1c1d1a1
aabcda

Note:

It is impossible to convert 'a3b1c1d1' to 'aabcda'

If repeated characters are only ever adjacent in the input string then we can implement encode and decode functions simply as:

def encode(s):
    d = {}
    for c in s:
        d[c] = d.get(c, 0) + 1
    return &#39;&#39;.join(f&#39;{k}{v}&#39; for k, v in d.items())

def decode(s):
    d = []
    for c in s:
        if c.isdecimal():
            d[-1][1] *= 10
            d[-1][1] += int(c)
        else:
            d.append([c, 0])
    return &#39;&#39;.join(f&#39;{c*n}&#39; for c, n in d)

str = &#39;aabcddzzzzzzzzzzyy&#39;
encoded = encode(str)
print(encoded)
decoded = decode(encoded)
print(decoded)

assert str == decoded

Output:

a2b1c1d2z10y2
aabcddzzzzzzzzzzyy

答案3

得分: 0

由于您要求仅翻译代码部分,以下是您提供的代码的翻译:

由于您的输入数据仅包含数字及其计数通常无法获取原始字符串

例如如果您的输入是a2b1c1d1则原始字符串可能是aabcdabacdabcadbaacd等

这有点像哈希您可以从一端到另一端但无法返回

但是如果输入应始终按顺序则可以

只需将数字乘以其计数

def occurances_to_string(s):
    result = ""
    temp = ""
    for i in range(0, len(s)):
        if(s[i].isdigit()):
            result += temp * int(s[i])
            temp = ""
        else:
            temp += s[i]
    return result

# >>>occurances_to_string("a2b1c1d1")
#aabcd

希望对您有所帮助。

英文:

As your input data is just the digits and their count, you normally cant get the original string.

Like as your input is a2b1c1d1, the original string may be aabcd,abacd,abcad,baacd & etc

Its somehow like hashing, that you can get from one end to another but you cant return

BUT, if the input should always be in order, then you can!

By just multiplying the digits in their count:

def occurances_to_string(s):
    result = &quot;&quot;
    temp = &quot;&quot;
    for i in range(0,len(s)):
        if(s[i].isdigit()):
            result += temp*int(s[i])
            temp = &quot;&quot;
        else:
            temp += s[i]
    return result

#&gt;&gt;&gt;occurances_to_string(&quot;a2b1c1d1&quot;)
#aabcd

Hope it helps

huangapple
  • 本文由 发表于 2023年5月29日 15:33:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76355446.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定