Data Clustering using python
For this portfolio is doing stock price analysis, and main topic is related to gdp and stock market indicator, it is hard to gather record data and cluster it, because it is already being clustered. Therefore, here picked 6 different individual stock from 3 different industies.
The first industry is streaming industy and the stocks here picked are "DOYU" and "HUYA". The second industry is new energy car, which consist of "NIO" and "Li", the third industry is online video broadcasting, and here picked "IQ" and "BILI" from it.
Here first grabbed recent 2 weeks of stock price and did a prelimiary data cleaning. Next created a new column called perc which means the percentage change over the day. Nextly, combined all 6 stock into one dataframe.As always, attached my picture below.
Next is to calculate the distance matrix using three method: Eulidean_distances, manhattan_distances and cosine_distances
After that, used k-mean method with k = 3 to cluster my data. After clustering, made a pair to pair comparision to see whenever the cluster result is working as expected. The result is that it work as expected and successfully clusted my stock according to its industry
After that, now going to use hierachical clustering to cluster my data. and here is the result.
After the record data, here going to analysis pure text data. Text data are from comments/news/review for APPLE's stock.
Above code for record data can be find at MY PYTHON CODE
code for text data can be found at here: Text cluster code