Decision Tree and Random Forest

Because the topic is about stock, supervised learning is the most interesting part in this portfolio. I am going to do stock technical parameters to predict stock trend in a single day

Firstly, gathered data from a stock, and calculated its "RSI","SMA","lma", and "adx" and deleted all other informations. Besides these new features,created a new feature called "Return" which is used to measure the UP trend in a given day or Down trend in a given day


Using R to run Decision Tree on this record data

After loading this data in, seperated it into two part, testing part and training part. deleted labels from testing data and run a decision tree model on it. The final decision tree is as below


From this decision tree, we can see that when the parameter "RSI" is greater than 42, and when adx > 44, sma > 30, the stock price is more like to goes up, however, when sma < 30, the stock is likely to goes down. ALso, when rsi < 42 and > 41, like between 41 and 42, the stock is very likely to goes down.ALso, a rsi of 42 indicates that it is a neither going up or going down number. which indicate that this stock is not likely to move significantly high or low.

Let's change a root and see what will happen, instead of rsi, we use adx to be be our root and see whats going on.


From this tree, we can see that there is not a much clear line for cutting the upward or downward trend. The entropy and gini is very high as the probability of going up or down is almost the same. Therefore, I would like to conclude that adx is not a good parameter for predicting stock.

Using Python to build Decision tree and Randomforest for text data

The text data are from twitter. The first folder consist of 10 text corpose of "NIO" and the second folder consist of 10 text corpose of "APPLE" company.


After getting these text corpose, loaded them into python and stemmed it, cleaned it, get rid of punctuations, lower letters. etc. Anyway, cleaned it and converted them into four dataframe.

After that, run decision tree from sk-learn and get the following picture.


From this decision tree we can see that the entropy is high and don't think it can make a very good prediction. Maybe because twitter have so many users and they all share different opinion about NIO and apple. Therefore, change the entropy to Gini index and try what is going on


From here we can see that gini is better than entropy in the model

Here is the confusion matrix


Next, to use random forest to model my data


It output many predictors and include them below


This is the random forest picture and next I am going to do some visualizations. For example, what is the most important feature? what is the most frequent one?


This is another matrix that can help us understand


Conclusion

In conclusion, Decision tree is a very powerful tool to help people determine what is the key parameters when forcast whenever a stock will increase or decrease in a given day/period. This is extremely important when establishing a overall big model for quantum trading.

Random forest is more important than decision tree in stock trading. After building multiple decision trees. we may pick the best decision tree by using random forest to create many. This is important is picking the best parameters to predict stocks and doing other analysis