‘1500’ 是什么?

huangapple go评论285阅读模式
英文:

What is '\u200d1500'?

问题

以下是代码部分的翻译:

I crawled data from a website which was in string format I replaced string character and now data only contains number. But when I want to convert this column to numeric I get that error. I have two columns which first is previous_prices other is now_prices. If now the product is not on sale program fill nas with previous_prices. Previous_prices type is int64, now_prices type is object. Error is: ValueError: invalid literal for int() with base 10: '\u200d1500&#39.

实际上,我看到了一个类似的问题,但那个问题与 ''\u200d1500'' 不相关。

now_prices_after_fillna
1450
‍1500
700
1700
2090

当我将 now_prices 更改为整数,然后使用 previous_prices 填充缺失值时,一般数据类型变为整数。但当我尝试将数据导出到 Excel 时,出现了此错误。我无法理解问题。

‘1500’ 是什么?

英文:

I crawled data from a website which was in string format I replaced string character and now data only contains number. But when I want to convert this column to numeric I get that error. I have two columns which first is previous_prices other is now_prices. If now the product is not on sale program fill nas with previous_prices. Previous_prices type is int64, now_prices type is object. Error is: ValueError: invalid literal for int() with base 10: '\u200d1500'.

Actually I saw a similiar question but that question is not relevant to '\u200d1500'.

now_prices_after_fillna
1450
‍1500
700
1700
2090

There are strange situation when When I change now_prices to integer and then fill na with previous_prices general data type was int. But when I want to export that data to excel I get this error. I can not understand problem.

‘1500’ 是什么?

答案1

得分: 2

因为\u200d是不可打印字符,以下是去除它并将其转换为整数的解决方案:

  1. df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500']})
  2. print(df)
  3. now_prices_after_fillna
  4. 0 1450
  5. 1 1500
  6. # https://stackoverflow.com/a/54451873/2901002
  7. import sys
  8. # 构建一个将所有不可打印字符映射到None的表
  9. NOPRINT_TRANS_TABLE = {
  10. i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
  11. }
  12. def make_printable(s):
  13. """替换字符串中的不可打印字符。"""
  14. # str的translate方法从字符串中删除映射到None的字符
  15. return s.translate(NOPRINT_TRANS_TABLE)
  16. df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
  17. .astype(int))
  18. print(df)
  19. now_prices_after_fillna
  20. 0 1450
  21. 1 1500

如果混合了数字和字符串值,可以尝试使用tryexcept语句:

  1. df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500', 1000]})
  2. print(df)
  3. # https://stackoverflow.com/a/54451873/2901002
  4. import sys
  5. # 构建一个将所有不可打印字符映射到None的表
  6. NOPRINT_TRANS_TABLE = {
  7. i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
  8. }
  9. def make_printable(s):
  10. """替换字符串中的不可打印字符。"""
  11. # str的translate方法从字符串中删除映射到None的字符
  12. try:
  13. return s.translate(NOPRINT_TRANS_TABLE)
  14. except AttributeError:
  15. return s
  16. df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
  17. .astype(int))
  18. print(df)
  19. now_prices_after_fillna
  20. 0 1450
  21. 1 1500
  22. 2 1000

测试你的真实数据:

  1. df = pd.read_excel('your_updated_file2222.xlsx')
  2. # https://stackoverflow.com/a/54451873/2901002
  3. import sys
  4. # 构建一个将所有不可打印字符映射到None的表
  5. NOPRINT_TRANS_TABLE = {
  6. i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
  7. }
  8. def make_printable(s):
  9. """替换字符串中的不可打印字符。"""
  10. # str的translate方法从字符串中删除映射到None的字符
  11. try:
  12. return s.translate(NOPRINT_TRANS_TABLE)
  13. except AttributeError:
  14. return s
  15. df['price'] = df['price'].apply(make_printable).astype(int)
  16. print(df)
  17. price
  18. 0 1450
  19. 1 1500
  20. 2 700
  21. 3 1700
  22. 4 2090
  23. .. ...
  24. 206 1500
  25. 207 1290
  26. 208 1500
  27. 209 1560
  28. 210 1800
  29. [211 x 1 列]
英文:

Because \u200d is not printable character, here is solution for remove it and converting to integers:

  1. df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500']})
  2. print (df)
  3. now_prices_after_fillna
  4. 0 1450
  5. 1 1500
  6. #https://stackoverflow.com/a/54451873/2901002
  7. import sys
  8. # build a table mapping all non-printable characters to None
  9. NOPRINT_TRANS_TABLE = {
  10. i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
  11. }
  12. def make_printable(s):
  13. """Replace non-printable characters in a string."""
  14. # the translate method on str removes characters
  15. # that map to None from the string
  16. return s.translate(NOPRINT_TRANS_TABLE)
  17. df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
  18. .astype(int))
  19. print (df)
  20. now_prices_after_fillna
  21. 0 1450
  22. 1 1500

Another idea if mixed numeric with strings values add try with except statement:

  1. df = pd.DataFrame({'now_prices_after_fillna':['1450', u'\u200d1500', 1000]})
  2. print (df)
  3. #https://stackoverflow.com/a/54451873/2901002
  4. import sys
  5. # build a table mapping all non-printable characters to None
  6. NOPRINT_TRANS_TABLE = {
  7. i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
  8. }
  9. def make_printable(s):
  10. """Replace non-printable characters in a string."""
  11. # the translate method on str removes characters
  12. # that map to None from the string
  13. try:
  14. return s.translate(NOPRINT_TRANS_TABLE)
  15. except AttributeError:
  16. return s
  17. df['now_prices_after_fillna'] = (df['now_prices_after_fillna'].apply(make_printable)
  18. .astype(int))
  19. print (df)
  20. now_prices_after_fillna
  21. 0 1450
  22. 1 1500
  23. 2 1000

Test your real data:

  1. df = pd.read_excel('your_updated_file2222.xlsx')
  2. #https://stackoverflow.com/a/54451873/2901002
  3. import sys
  4. # build a table mapping all non-printable characters to None
  5. NOPRINT_TRANS_TABLE = {
  6. i: None for i in range(0, sys.maxunicode + 1) if not chr(i).isprintable()
  7. }
  8. def make_printable(s):
  9. """Replace non-printable characters in a string."""
  10. # the translate method on str removes characters
  11. # that map to None from the string
  12. try:
  13. return s.translate(NOPRINT_TRANS_TABLE)
  14. except AttributeError:
  15. return s

  1. df['price'] = df['price'].apply(make_printable).astype(int)
  2. print (df)
  3. price
  4. 0 1450
  5. 1 1500
  6. 2 700
  7. 3 1700
  8. 4 2090
  9. .. ...
  10. 206 1500
  11. 207 1290
  12. 208 1500
  13. 209 1560
  14. 210 1800
  15. [211 rows x 1 columns]

huangapple
  • 本文由 发表于 2023年3月7日 13:43:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/75658406.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定