2023年3月7日 13:43:21go评论285阅读模式

英文:

What is '\u200d1500'?

问题

以下是代码部分的翻译：

I crawled data from a website which was in string format I replaced string character and now data only contains number. But when I want to convert this column to numeric I get that error. I have two columns which first is previous_prices other is now_prices. If now the product is not on sale program fill nas with previous_prices. Previous_prices type is int64, now_prices type is object. Error is: ValueError: invalid literal for int() with base 10: '\u200d1500&#39.

实际上，我看到了一个类似的问题，但那个问题与 ''\u200d1500'' 不相关。

now_prices_after_fillna
1450
‍1500
700
1700
2090

当我将 now_prices 更改为整数，然后使用 previous_prices 填充缺失值时，一般数据类型变为整数。但当我尝试将数据导出到 Excel 时，出现了此错误。我无法理解问题。

英文:

Actually I saw a similiar question but that question is not relevant to '\u200d1500'.

now_prices_after_fillna
1450
‍1500
700
1700
2090

There are strange situation when When I change now_prices to integer and then fill na with previous_prices general data type was int. But when I want to export that data to excel I get this error. I can not understand problem.

答案1

得分: 2

因为\u200d是不可打印字符，以下是去除它并将其转换为整数的解决方案：

df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500']})
print(df)
  now_prices_after_fillna
0                    1450
1                   ‍1500
# https://stackoverflow.com/a/54451873/2901002
import sys
# 构建一个将所有不可打印字符映射到None的表
NOPRINT_TRANS_TABLE = {
    i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
    """替换字符串中的不可打印字符。"""
    # str的translate方法从字符串中删除映射到None的字符
    return s.translate(NOPRINT_TRANS_TABLE)
df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
                                                              .astype(int))
print(df)
 now_prices_after_fillna
0                     1450
1                     1500

如果混合了数字和字符串值，可以尝试使用try和except语句：

df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500', 1000]})
print(df)
# https://stackoverflow.com/a/54451873/2901002
import sys
# 构建一个将所有不可打印字符映射到None的表
NOPRINT_TRANS_TABLE = {
    i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
    """替换字符串中的不可打印字符。"""
    # str的translate方法从字符串中删除映射到None的字符
    try:
        return s.translate(NOPRINT_TRANS_TABLE)
    except AttributeError:
        return s
df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
                                                              .astype(int))
print(df)
 now_prices_after_fillna
0                     1450
1                     1500
2                     1000

测试你的真实数据：

df = pd.read_excel('your_updated_file2222.xlsx')
# https://stackoverflow.com/a/54451873/2901002
import sys
# 构建一个将所有不可打印字符映射到None的表
NOPRINT_TRANS_TABLE = {
    i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
    """替换字符串中的不可打印字符。"""
    # str的translate方法从字符串中删除映射到None的字符
    try:
        return s.translate(NOPRINT_TRANS_TABLE)
    except AttributeError:
        return s
df['price'] = df['price'].apply(make_printable).astype(int)
print(df)
     price
0     1450
1     1500
2      700
3     1700
4     2090
..     ...
206   1500
207   1290
208   1500
209   1560
210   1800
[211 行 x 1 列]

英文:

Because \u200d is not printable character, here is solution for remove it and converting to integers:

df = pd.DataFrame({&#39;now_prices_after_fillna&#39;:[&#39;1450&#39;, u&#39;\u200d1500&#39;]})
    
print (df)
  now_prices_after_fillna
0                    1450
1                   ‍1500
#https://stackoverflow.com/a/54451873/2901002
import sys
# build a table mapping all non-printable characters to None
NOPRINT_TRANS_TABLE = {
    i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
    &quot;&quot;&quot;Replace non-printable characters in a string.&quot;&quot;&quot;
    # the translate method on str removes characters
    # that map to None from the string
    return s.translate(NOPRINT_TRANS_TABLE)
df[&#39;now_prices_after_fillna&#39;] = (df[&#39;now_prices_after_fillna&#39;].apply(make_printable)
                                                              .astype(int))
print (df)
   now_prices_after_fillna
0                     1450
1                     1500

Another idea if mixed numeric with strings values add try with except statement:

df = pd.DataFrame({&#39;now_prices_after_fillna&#39;:[&#39;1450&#39;, u&#39;\u200d1500&#39;, 1000]})
    
print (df)
#https://stackoverflow.com/a/54451873/2901002
import sys
# build a table mapping all non-printable characters to None
NOPRINT_TRANS_TABLE = {
    i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
    &quot;&quot;&quot;Replace non-printable characters in a string.&quot;&quot;&quot;
    # the translate method on str removes characters
    # that map to None from the string
    try:
        return s.translate(NOPRINT_TRANS_TABLE)
    except AttributeError:
        return s
df[&#39;now_prices_after_fillna&#39;] = (df[&#39;now_prices_after_fillna&#39;].apply(make_printable)
                                                              .astype(int))
print (df)
   now_prices_after_fillna
0                     1450
1                     1500
2                     1000

Test your real data:

df = pd.read_excel(&#39;your_updated_file2222.xlsx&#39;)
#https://stackoverflow.com/a/54451873/2901002
import sys
# build a table mapping all non-printable characters to None
NOPRINT_TRANS_TABLE = {
    i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
    &quot;&quot;&quot;Replace non-printable characters in a string.&quot;&quot;&quot;
    # the translate method on str removes characters
    # that map to None from the string
    try:
        return s.translate(NOPRINT_TRANS_TABLE)
    except AttributeError:
        return s

df[&#39;price&#39;] = df[&#39;price&#39;].apply(make_printable).astype(int)
print (df)
     price
0     1450
1     1500
2      700
3     1700
4     2090
..     ...
206   1500
207   1290
208   1500
209   1560
210   1800
[211 rows x 1 columns]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

‘1500’ 是什么？

问题

答案1

我需要关于情感分析和机器学习的建议。

Checking for Empty Integer in Golang

Loop a dataframe and check if there is the same name as another column.

数据框最大匹配两列

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。