确定 pandas DataFrame 中一个字符串包含的语言是什么

huangapple go评论120阅读模式
英文:

Determining what language a string contains in a pandas DataFrame

问题

  1. # 代码部分不要翻译,只返回翻译好的部分
  2. My dataframe:
  3. **df**
  4. ```python
  5. Text
  6. Best tv in 2020
  7. utilizar un servicio sms gratuito
  8. utiliser un tv pour netflix

My desired output

  1. Text Language
  2. Best tv in 2020 en
  3. utilizar un servicio sms gratuito es
  4. utiliser un tv pour netflix fr

What I am using:

  1. from textblob import TextBlob
  2. b = TextBlob("utilizar un servicio sms gratuito")
  3. print(b.detect_language())
  4. >>es

I am not sure how I could integrate this method to fill my Pandas Dataframe.

I have tried:

  1. df['Language'] = df['Text'].apply(lambda x: TextBlob(x).detect_language())

But I am getting an error:

  1. TypeError: The `text` argument passed to `__init__(text)` must be a string, not <class 'pandas.core.series.Series'>

I understand what it means, that I need to pass a string rather than pandas DataFrame Series, so my question is how would I loop the entire Series to detect language per row in column text?

Thank you for your suggestions.

  1. <details>
  2. <summary>英文:</summary>
  3. I am new to Pandas and Python.
  4. My dataframe:
  5. **df**

Text
Best tv in 2020
utilizar un servicio sms gratuito
utiliser un tv pour netflix

  1. **My desired output**

Text Language
Best tv in 2020 en
utilizar un servicio sms gratuito es
utiliser un tv pour netflix fr

  1. **What I am using:**

from textblob import TextBlob

b = TextBlob("utilizar un servicio sms gratuito")
print(b.detect_language())

>>es

  1. I am not sure how I could integrate this method to fill my Pandas Dataframe.
  2. **I have tried:**

df['Language'] = TextBlob(df['Text']).detect_language()

  1. But I am getting an error:

TypeError: The text argument passed to __init__(text) must be a string, not <class 'pandas.core.series.Series'>

  1. I understand what it means, that I need to pass a string rather than pandas DataFrame Series, so my question is how would I loop the entire Series to detect language per row in column `text`?
  2. Thank you for your suggestions.
  3. </details>
  4. # 答案1
  5. **得分**: 3
  6. 使用 [`Series.apply`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html) 与 lambda 函数:
  7. ```python
  8. df['Language'] = df['Text'].apply(lambda x: TextBlob(x).detect_language())

或者 Series.map

  1. df['Language'] = df['Text'].map(lambda x: TextBlob(x).detect_language())

  1. print (df)
  2. Text Language
  3. 0 Best tv in 2020 en
  4. 1 utilizar un servicio sms gratuito es
  5. 2 utiliser un tv pour netflix fr
英文:

Use Series.apply with lambda function:

  1. df[&#39;Language&#39;] = df[&#39;Text&#39;].apply(lambda x: TextBlob(x).detect_language())

Or Series.map:

  1. df[&#39;Language&#39;] = df[&#39;Text&#39;].map(lambda x: TextBlob(x).detect_language())

  1. print (df)
  2. Text Language
  3. 0 Best tv in 2020 en
  4. 1 utilizar un servicio sms gratuito es
  5. 2 utiliser un tv pour netflix fr

huangapple
  • 本文由 发表于 2020年1月6日 18:04:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610076.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定