Firstly, we are going to talk about Naive Bayes here,so what is naive bayes and how should we apply Naive Bayes here in this stock analysis
Naive Bayes is a formulate that derived from the conditional probability, Condition probability says that P(A|B) = P(A and B)/P(B). Therefore, We have our Naive Bayes formula P(A|B) = P(B|A)P(A)/P(B). In machine learning, B is a set of variables, and we want to use the occurance of B to predict the probability of A, which is P(A|B), and how we achieve that is through the training dataset to train P(B|A), for which we have A as our labels, there by we can calculate the right hand side.
in this stock analysis, plan to use stock parameters like rsi, sma and lma to predict the stock price given that day.
In here, spliting our data into trainding set and testing set, and after that , running our Naive Bayes and get the result. After getting the result, here did a confusion matrix to illustrate our result. So the first picture below is our nb result and the second one is predcit vs actual confusion matrix
c
In here, we have some very important conclusion. If the predictor predict that the stock is going down, there is 67% chance that it will go down. And when it predict that it will goes up, 60%chance that it will goes up as well. So we can use this to predcit our results.
Similarty, we run this in python, the nb in python gave me a very different result, and here is the confusion matrix.
For python's one, it has prediction on both up and down a lot, but the prediction accuracy for up than down.
In this section, going to analysis text corpus and based on the context, in here have 12 text files and each has its own label, here will put the screenshot below.
Here I run nb on them and here is my prediction
Base on our model, the model predict all of them to be nvidia but actually is not, it may because our data is not very good, they are from twitter and maybe the corpus from nvidia is something more general and that will let me computer to have the false prediction.
Running the similar code but in different lanagugae, but here i try to change the modeling method
Also, in python,changing the text corpus to two different folder and each folder represent a different label. I run NB and here is the confusion matrix.
In here, the first parameter is the best fit since it successsfully predict all labels.
A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new text.
For example, if we are trying to use SVM to model our record data, we will have each row as a point in the dimensional space and we draw lines to seperate them, the seperating lines are vectors in svm and we can use multiple of them, in text data, it is basically the same, but the dimensionality is just significant higher.
Using the same data, but only change the machine learning method to svm, we have an accuracy of 0.59.
Now run the confusion matrix and here is what we got.
Here, try for different cost(C), and here is the plot
Although chaned our C, but in here have the same accuracy, so many it doesn't work
we are using the same dataset, so what we are changing here is only the machine learning method to svm
It successfully predicted Nvidia right, and the rest it all predict it to be tesla
Now we run the svm for our record data in python, and here is the confusion matrix
And we try for different kenerl, we have a different prediction
We can see that we have a better prediction in predicting the upward trend here.
We are using the same dataset but change to SVM, here is the confusion matrix that predicted vs actual
Since having only one predicted wrong so call this a good machine learning model for the data.
Here changed for different kernel and have different confusion matrix. the first one is using rbf kernel and the second one is using poly kernel, but don't get as good prediction as the original one
Here is a visualization of the top words in our corpus, and how each word is realted to the label,for which blue is red is apple and blue is nio
Here is the code for python and R, they are attached below
python code R codeNaive Bayes and SVM and two good tools for me to analysis stocks. It provide people good tools to help me identify whenever the stock will goes up or down in a given day. Given this, we can apply this to quantum trading, computational trading where computer can automatically trade stocks. By employing this, we can make profit if our expected return is positive given our predicted trend.
For text, NB and SVM can help us identify what people are talking about , for example, we can get the sentiment of what people is talking and use that to predict our stock market. These are very useful tools in stock trading and stock predicting.