我需要关于情感分析和机器学习的建议。

huangapple go评论66阅读模式
英文:

I need advice on Sentimental analysis and ML

问题

我正在使用snscrape来抓取有关EURUSD的推文,并结合机器学习来预测EURUSD的价格是否会在第二天上涨或下跌,使用已抓取的推文的情感。我在这个项目中遇到的问题是如何计划和组织我的代码,比如我应该将这些推文用作ML模型的特征,还是应该对当天的这些推文情感进行平均,并将它们用作模型的特征。我将感激那些曾经做过类似项目的人提供的任何建议。

英文:

I'm using snscrape to scrape tweets about EURUSD and combining machine learning to predict if the price of of EURUSD will go up or down the following day using sentiments of those tweets that have been scraped. The problem I have with this project is how I would plan and structure my code, like for example should I use those tweets as features for the ML model or should I average the the sentiments of those tweets for that particular day and use them as features for the model to use. I will appreciate any advice from people that have worked on similar projects like these.

答案1

得分: -1

提供您拥有所需的令牌,您可以测试执行类似这样的操作:

import pandas as pd
import snscrape.modules.twitter as sntwitter
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score


tweets = []
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('EURUSD').get_items()):
    if i > 1000:
        break
    tweets.append([tweet.date, tweet.content])
df_tweets = pd.DataFrame(tweets, columns=['date', 'text'])
df_tweets.to_csv('tweets.csv', index=False)


def get_sentiment(text):
    sentiment = TextBlob(text).sentiment.polarity
    if sentiment > 0:
        return 1
    elif sentiment < 0:
        return -1
    else:
        return 0
df_tweets['sentiment'] = df_tweets['text'].apply(get_sentiment)


df_features = df_tweets.groupby('date').agg({'sentiment': 'mean'})
df_features.reset_index(inplace=True)


X_train, X_test, y_train, y_test = train_test_split(df_features.drop('date', axis=1), df_features['price_direction'], test_size=0.2, random_state=42)


rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rfc.fit(X_train, y_train)


y_pred = rfc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)


last_day_sentiment = df_features.iloc[-1]['sentiment']
next_day_direction = rfc.predict([[last_day_sentiment]])[0]
print('Next day direction:', next_day_direction)

如您所请求,这是代码的翻译部分,没有其他内容。

英文:

Provided that you have the tokens necessary you can test to do something like this:

import pandas as pd
import snscrape.modules.twitter as sntwitter
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score


tweets = []
for i, tweet in enumerate(sntwitter.TwitterSearchScraper(&#39;EURUSD&#39;).get_items()):
    if i &gt; 1000:
        break
    tweets.append([tweet.date, tweet.content])
df_tweets = pd.DataFrame(tweets, columns=[&#39;date&#39;, &#39;text&#39;])
df_tweets.to_csv(&#39;tweets.csv&#39;, index=False)


def get_sentiment(text):
    sentiment = TextBlob(text).sentiment.polarity
    if sentiment &gt; 0:
        return 1
    elif sentiment &lt; 0:
        return -1
    else:
        return 0
df_tweets[&#39;sentiment&#39;] = df_tweets[&#39;text&#39;].apply(get_sentiment)


df_features = df_tweets.groupby(&#39;date&#39;).agg({&#39;sentiment&#39;: &#39;mean&#39;})
df_features.reset_index(inplace=True)


X_train, X_test, y_train, y_test = train_test_split(df_features.drop(&#39;date&#39;, axis=1), df_features[&#39;price_direction&#39;], test_size=0.2, random_state=42)


rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rfc.fit(X_train, y_train)


y_pred = rfc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(&#39;Accuracy:&#39;, accuracy)


last_day_sentiment = df_features.iloc[-1][&#39;sentiment&#39;]
next_day_direction = rfc.predict([[last_day_sentiment]])[0]
print(&#39;Next day direction:&#39;, next_day_direction)

huangapple
  • 本文由 发表于 2023年3月12日 17:04:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75712050.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定