‘Columns must be same length as key’ error when trying .Split

huangapple go评论72阅读模式
英文:

'Columns must be same length as key' error when trying .Split

问题

以下是您提供的代码的正确输出(在Python 3.8中运行的输出):

|    | Name        | Symbol    | 1h     | 24h      | MarketCap   |
|---:|:------------|:----------|:-------|:---------|:------------|
|  3 | Shrekt      |4HREK      | 23.82% | 2536.51% | 342,357     |
|  8 | BLAZE       |TOKEN9BLZE | 1.07%  | 106.71%  | 3,828,088   |
| 26 | Goner27     |GONER      | 6.32%  | 88.09%   | 1,094,010   |
| 14 | Party Hat15 |PHAT       | 13.34% | 81.64%   | 60,136      |
| 29 | PepeChat    |30PPC      | 48.01% | 78.25%   | 431,159     |

请注意,这是在Python 3.8中运行时的正确输出。如果在Python 3.10中出现问题,可能需要查看Python 3.10的新特性和更改,以确定问题的根本原因。

英文:

The code below just runs fine with Python 3.8.10 but does not run in Python 3.10. Any idea what could be the problem?

import pandas as pd
import requests

url = "https://coinmarketcap.com/new/"
page = requests.get(url,headers={'User-Agent': 'Mozilla/5.0'}, timeout=1)
pagedata = page.text
usecols = ["Name", "Symbol", "1h", "24h", "MarketCap"]


df = pd.read_html(page.text)[0]
df[["Name", "Symbol"]] = df["Name"].str.split(r"\d+", expand=True)

df = (df.rename(columns={"Fully Diluted Market Cap": "MarketCap"})[usecols]
          .sort_values("24h", ascending=False, key=lambda ser: ser.str.replace("%", "").astype(float))
          .replace(r"^$", "", regex=True)
     )

numcols = df.columns[~df.columns.isin(['Name'])]
df = df.head(5).to_markdown(index=True)
print (df)

Current Output:

Traceback (most recent call last):
  df[["Name", "Symbol"]] = df["Name"].str.split(r"\d+", expand=True)
  ....
  ....
  ValueError: Columns must be same length as key

Correct Output: (Output in Python 3.8)

|    | Name        | Symbol    | 1h     | 24h      | MarketCap   |
|---:|:------------|:----------|:-------|:---------|:------------|
|  3 | Shrekt      |4HREK      | 23.82% | 2536.51% | 342,357     |
|  8 | BLAZE       |TOKEN9BLZE | 1.07%  | 106.71%  | 3,828,088   |
| 26 | Goner27     |GONER      | 6.32%  | 88.09%   | 1,094,010   |
| 14 | Party Hat15 |PHAT       | 13.34% | 81.64%   | 60,136      |
| 29 | PepeChat    |30PPC      | 48.01% | 78.25%   | 431,159     |

答案1

得分: 3

以下是您要翻译的内容:

"我认为这与在列“Name”中找到的一个值(NOOT (BRC-20)4NOOT)有关。

为了处理这个问题,我们可以尝试在该列的每一行中分割出最后一个数字。

将此替换为:

df[["Name", "Symbol"]] = df["Name"].str.split(r"\d+(?!.*\d)", expand=True)

> 正则表达式演示:demo

输出:

print(df)

|    | Name        | Symbol   | 1h     | 24h      | MarketCap   |
|---:|:------------|:---------|:-------|:---------|:------------|
|  5 | Shrekt      | HREK     | 54.61% | 1124.57% | 159,013     |
| 10 | BLAZE TOKEN | BLZE     | 2.40%  | 109.53%  | 3,880,242   |
|  8 | CMC DOGE    | CMCDOGE  | 12.93% | 102.76%  | 169,492     |
| 28 | Goner       | GONER    | 1.37%  | 88.66%   | 1,050,089   |
|  4 | nomeme      | NOMEME   | 53.86% | 86.14%   | 4,603,393   |"
英文:

I think it has to do with one of the values (NOOT (BRC-20)4NOOT) found in the column Name.

To handle this, we can try to split on the last number found in each row of this column.

Replace this :

df[["Name", "Symbol"]] = df["Name"].str.split(r"\d+", expand=True)

By this :

df[["Name", "Symbol"]] = df["Name"].str.split(r"\d+(?!.*\d)", expand=True)

> Regex [demo]

Output :

print(df)

|    | Name        | Symbol   | 1h     | 24h      | MarketCap   |
|---:|:------------|:---------|:-------|:---------|:------------|
|  5 | Shrekt      | HREK     | 54.61% | 1124.57% | 159,013     |
| 10 | BLAZE TOKEN | BLZE     | 2.40%  | 109.53%  | 3,880,242   |
|  8 | CMC DOGE    | CMCDOGE  | 12.93% | 102.76%  | 169,492     |
| 28 | Goner       | GONER    | 1.37%  | 88.66%   | 1,050,089   |
|  4 | nomeme      | NOMEME   | 53.86% | 86.14%   | 4,603,393   |

答案2

得分: 2

你可以在函数之后创建一个新的数据框并使用它,不一定要继续使用"df"作为数据框名称。尝试这样做:

newDF = df["Name"].str.split(r"\d+", expand=True)
print(newDF)

修正后的代码:

df["Name"] = df["Name"].str.replace("\(BRC-20\)","")

将这行代码添加到你的代码中,它会替换掉任何包含"(BRC-20)"的部分。所以问题并不是关于你的Python版本。

英文:

Do you have to use keep using df as dataframe. I think you can create a new one and use that one after the function. Try just doing this

newDF = df["Name"].str.split(r"\d+", expand=True)
print(newDF)

Edit Fixed Code:

df["Name"] = df["Name"].str.replace("\(BRC-20\)","")

add this line to your code which will replace anything that has (BRC-20) in it.
So the problem wasn't being about the version of your python.

答案3

得分: 1

我认为你需要按照列'#'的值来拆分名称。你可以这样做:

创建一个用于拆分的函数:

def splitting(num, strng):
    splitted = strng.split(str(num))
    print(num)
    return [splitted[0], splitted[1]]

然后应用这个函数并将新列分解成新的列:

df["split"] = df.apply(lambda x: splitting(x['#'], x['Name']), axis=1)
df[['OnlyName','Symbol']] = pd.DataFrame(df.split.tolist(), index=df.index)

如果有帮助,请告诉我。

英文:

I think you need to split the name by the value of columns '#'.
You can do it like this:

Create a function for splitting:

def splitting(num, strng):
    splitted = strng.split(str(num))
    print (num)
    return [splitted[0], splitted[1]]

Then apply the function and explode the new column into new columns:

df["split"] = df.apply(lambda x: splitting(x['#'], x['Name']), axis=1)
df[['OnlyName','Symbol']] = pd.DataFrame(df.split.tolist(), index= df.index)

Let me know if this helps.

huangapple
  • 本文由 发表于 2023年5月11日 17:22:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226066.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定