Naive Bayes and Support Vector Machine

What is Naive Bayes and how should we use it

Firstly, we are going to talk about Naive Bayes here,so what is naive bayes and how should we apply Naive Bayes here in this stock analysis

Naive Bayes is a formulate that derived from the conditional probability, Condition probability says that P(A|B) = P(A and B)/P(B). Therefore, We have our Naive Bayes formula P(A|B) = P(B|A)P(A)/P(B). In machine learning, B is a set of variables, and we want to use the occurance of B to predict the probability of A, which is P(A|B), and how we achieve that is through the training dataset to train P(B|A), for which we have A as our labels, there by we can calculate the right hand side.

Record data NB using R

in this stock analysis, plan to use stock parameters like rsi, sma and lma to predict the stock price given that day.


In here, spliting our data into trainding set and testing set, and after that , running our Naive Bayes and get the result. After getting the result, here did a confusion matrix to illustrate our result. So the first picture below is our nb result and the second one is predcit vs actual confusion matrix


c

In here, we have some very important conclusion. If the predictor predict that the stock is going down, there is 67% chance that it will go down. And when it predict that it will goes up, 60%chance that it will goes up as well. So we can use this to predcit our results.

Record data NB using Python

Similarty, we run this in python, the nb in python gave me a very different result, and here is the confusion matrix.


For python's one, it has prediction on both up and down a lot, but the prediction accuracy for up than down.

Text data NB using R

In this section, going to analysis text corpus and based on the context, in here have 12 text files and each has its own label, here will put the screenshot below.


Here I run nb on them and here is my prediction


Base on our model, the model predict all of them to be nvidia but actually is not, it may because our data is not very good, they are from twitter and maybe the corpus from nvidia is something more general and that will let me computer to have the false prediction.

Text data NB using Python

Running the similar code but in different lanagugae, but here i try to change the modeling method

Also, in python,changing the text corpus to two different folder and each folder represent a different label. I run NB and here is the confusion matrix.


In here, the first parameter is the best fit since it successsfully predict all labels.

What is SVM and how should we use it

A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new text.

For example, if we are trying to use SVM to model our record data, we will have each row as a point in the dimensional space and we draw lines to seperate them, the seperating lines are vectors in svm and we can use multiple of them, in text data, it is basically the same, but the dimensionality is just significant higher.

Record Data using SVM

Using the same data, but only change the machine learning method to svm, we have an accuracy of 0.59.


Now run the confusion matrix and here is what we got.


Here, try for different cost(C), and here is the plot


Although chaned our C, but in here have the same accuracy, so many it doesn't work

Text Data using R

we are using the same dataset, so what we are changing here is only the machine learning method to svm


It successfully predicted Nvidia right, and the rest it all predict it to be tesla

Record data in python for SVM

Now we run the svm for our record data in python, and here is the confusion matrix


And we try for different kenerl, we have a different prediction


We can see that we have a better prediction in predicting the upward trend here.

Text data in python for SVM

We are using the same dataset but change to SVM, here is the confusion matrix that predicted vs actual


Since having only one predicted wrong so call this a good machine learning model for the data.

Here changed for different kernel and have different confusion matrix. the first one is using rbf kernel and the second one is using poly kernel, but don't get as good prediction as the original one


Here is a visualization of the top words in our corpus, and how each word is realted to the label,for which blue is red is apple and blue is nio


Here is the code for python and R, they are attached below

python code R code

Conclusion

Naive Bayes and SVM and two good tools for me to analysis stocks. It provide people good tools to help me identify whenever the stock will goes up or down in a given day. Given this, we can apply this to quantum trading, computational trading where computer can automatically trade stocks. By employing this, we can make profit if our expected return is positive given our predicted trend.

For text, NB and SVM can help us identify what people are talking about , for example, we can get the sentiment of what people is talking and use that to predict our stock market. These are very useful tools in stock trading and stock predicting.