英文:
What is '\u200d1500'?
问题
以下是代码部分的翻译:
I crawled data from a website which was in string format I replaced string character and now data only contains number. But when I want to convert this column to numeric I get that error. I have two columns which first is previous_prices other is now_prices. If now the product is not on sale program fill nas with previous_prices. Previous_prices type is int64, now_prices type is object. Error is: ValueError: invalid literal for int() with base 10: '\u200d1500'.
实际上,我看到了一个类似的问题,但那个问题与 ''\u200d1500'' 不相关。
now_prices_after_fillna |
---|
1450 |
1500 |
700 |
1700 |
2090 |
当我将 now_prices 更改为整数,然后使用 previous_prices 填充缺失值时,一般数据类型变为整数。但当我尝试将数据导出到 Excel 时,出现了此错误。我无法理解问题。
英文:
I crawled data from a website which was in string format I replaced string character and now data only contains number. But when I want to convert this column to numeric I get that error. I have two columns which first is previous_prices other is now_prices. If now the product is not on sale program fill nas with previous_prices. Previous_prices type is int64, now_prices type is object. Error is: ValueError: invalid literal for int() with base 10: '\u200d1500'.
Actually I saw a similiar question but that question is not relevant to '\u200d1500'.
now_prices_after_fillna |
---|
1450 |
1500 |
700 |
1700 |
2090 |
There are strange situation when When I change now_prices to integer and then fill na with previous_prices general data type was int. But when I want to export that data to excel I get this error. I can not understand problem.
答案1
得分: 2
因为\u200d
是不可打印字符,以下是去除它并将其转换为整数的解决方案:
df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500']})
print(df)
now_prices_after_fillna
0 1450
1 1500
# https://stackoverflow.com/a/54451873/2901002
import sys
# 构建一个将所有不可打印字符映射到None的表
NOPRINT_TRANS_TABLE = {
i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
"""替换字符串中的不可打印字符。"""
# str的translate方法从字符串中删除映射到None的字符
return s.translate(NOPRINT_TRANS_TABLE)
df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
.astype(int))
print(df)
now_prices_after_fillna
0 1450
1 1500
如果混合了数字和字符串值,可以尝试使用try
和except
语句:
df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500', 1000]})
print(df)
# https://stackoverflow.com/a/54451873/2901002
import sys
# 构建一个将所有不可打印字符映射到None的表
NOPRINT_TRANS_TABLE = {
i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
"""替换字符串中的不可打印字符。"""
# str的translate方法从字符串中删除映射到None的字符
try:
return s.translate(NOPRINT_TRANS_TABLE)
except AttributeError:
return s
df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
.astype(int))
print(df)
now_prices_after_fillna
0 1450
1 1500
2 1000
测试你的真实数据:
df = pd.read_excel('your_updated_file2222.xlsx')
# https://stackoverflow.com/a/54451873/2901002
import sys
# 构建一个将所有不可打印字符映射到None的表
NOPRINT_TRANS_TABLE = {
i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
"""替换字符串中的不可打印字符。"""
# str的translate方法从字符串中删除映射到None的字符
try:
return s.translate(NOPRINT_TRANS_TABLE)
except AttributeError:
return s
df['price'] = df['price'].apply(make_printable).astype(int)
print(df)
price
0 1450
1 1500
2 700
3 1700
4 2090
.. ...
206 1500
207 1290
208 1500
209 1560
210 1800
[211 行 x 1 列]
英文:
Because \u200d
is not printable character, here is solution for remove it and converting to integers:
df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500']})
print (df)
now_prices_after_fillna
0 1450
1 1500
#https://stackoverflow.com/a/54451873/2901002
import sys
# build a table mapping all non-printable characters to None
NOPRINT_TRANS_TABLE = {
i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
"""Replace non-printable characters in a string."""
# the translate method on str removes characters
# that map to None from the string
return s.translate(NOPRINT_TRANS_TABLE)
df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
.astype(int))
print (df)
now_prices_after_fillna
0 1450
1 1500
Another idea if mixed numeric with strings values add try
with except statement:
df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500', 1000]})
print (df)
#https://stackoverflow.com/a/54451873/2901002
import sys
# build a table mapping all non-printable characters to None
NOPRINT_TRANS_TABLE = {
i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
"""Replace non-printable characters in a string."""
# the translate method on str removes characters
# that map to None from the string
try:
return s.translate(NOPRINT_TRANS_TABLE)
except AttributeError:
return s
df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
.astype(int))
print (df)
now_prices_after_fillna
0 1450
1 1500
2 1000
Test your real data:
df = pd.read_excel('your_updated_file2222.xlsx')
#https://stackoverflow.com/a/54451873/2901002
import sys
# build a table mapping all non-printable characters to None
NOPRINT_TRANS_TABLE = {
i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
}
def make_printable(s):
"""Replace non-printable characters in a string."""
# the translate method on str removes characters
# that map to None from the string
try:
return s.translate(NOPRINT_TRANS_TABLE)
except AttributeError:
return s
df['price'] = df['price'].apply(make_printable).astype(int)
print (df)
price
0 1450
1 1500
2 700
3 1700
4 2090
.. ...
206 1500
207 1290
208 1500
209 1560
210 1800
[211 rows x 1 columns]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论