英文:
I need advice on Sentimental analysis and ML
问题
我正在使用snscrape来抓取有关EURUSD的推文,并结合机器学习来预测EURUSD的价格是否会在第二天上涨或下跌,使用已抓取的推文的情感。我在这个项目中遇到的问题是如何计划和组织我的代码,比如我应该将这些推文用作ML模型的特征,还是应该对当天的这些推文情感进行平均,并将它们用作模型的特征。我将感激那些曾经做过类似项目的人提供的任何建议。
英文:
I'm using snscrape to scrape tweets about EURUSD and combining machine learning to predict if the price of of EURUSD will go up or down the following day using sentiments of those tweets that have been scraped. The problem I have with this project is how I would plan and structure my code, like for example should I use those tweets as features for the ML model or should I average the the sentiments of those tweets for that particular day and use them as features for the model to use. I will appreciate any advice from people that have worked on similar projects like these.
答案1
得分: -1
提供您拥有所需的令牌,您可以测试执行类似这样的操作:
import pandas as pd
import snscrape.modules.twitter as sntwitter
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
tweets = []
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('EURUSD').get_items()):
if i > 1000:
break
tweets.append([tweet.date, tweet.content])
df_tweets = pd.DataFrame(tweets, columns=['date', 'text'])
df_tweets.to_csv('tweets.csv', index=False)
def get_sentiment(text):
sentiment = TextBlob(text).sentiment.polarity
if sentiment > 0:
return 1
elif sentiment < 0:
return -1
else:
return 0
df_tweets['sentiment'] = df_tweets['text'].apply(get_sentiment)
df_features = df_tweets.groupby('date').agg({'sentiment': 'mean'})
df_features.reset_index(inplace=True)
X_train, X_test, y_train, y_test = train_test_split(df_features.drop('date', axis=1), df_features['price_direction'], test_size=0.2, random_state=42)
rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
last_day_sentiment = df_features.iloc[-1]['sentiment']
next_day_direction = rfc.predict([[last_day_sentiment]])[0]
print('Next day direction:', next_day_direction)
如您所请求,这是代码的翻译部分,没有其他内容。
英文:
Provided that you have the tokens necessary you can test to do something like this:
import pandas as pd
import snscrape.modules.twitter as sntwitter
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
tweets = []
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('EURUSD').get_items()):
if i > 1000:
break
tweets.append([tweet.date, tweet.content])
df_tweets = pd.DataFrame(tweets, columns=['date', 'text'])
df_tweets.to_csv('tweets.csv', index=False)
def get_sentiment(text):
sentiment = TextBlob(text).sentiment.polarity
if sentiment > 0:
return 1
elif sentiment < 0:
return -1
else:
return 0
df_tweets['sentiment'] = df_tweets['text'].apply(get_sentiment)
df_features = df_tweets.groupby('date').agg({'sentiment': 'mean'})
df_features.reset_index(inplace=True)
X_train, X_test, y_train, y_test = train_test_split(df_features.drop('date', axis=1), df_features['price_direction'], test_size=0.2, random_state=42)
rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
last_day_sentiment = df_features.iloc[-1]['sentiment']
next_day_direction = rfc.predict([[last_day_sentiment]])[0]
print('Next day direction:', next_day_direction)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论