How Best to Interpret this Data for Prediction Model

CPerry

New member
Joined
Mar 15, 2018
Messages
16
Hi there, apologies if this is the wrong area to post in (I think this question best comes under probability) so please, foremost, direct me to an appropriate area if I am mistaken. If I am in the correct area then this is my problem...

I have a moving graph (using Microsoft Excel) that updates once a second and I am hoping to interpret a single value from it. I will use this value to test whether it has any significance in forecasting future moves over time. A snapchat of the graph looks like this...

Screen Shot 2018-09-04 at 12.59.37.png

As you can see, most of the recent activity is at 2.6 so logically it appears that the next lot of movement will be centred around this. I initially thought that simply taking the slope of the linear trend line would be sufficient but this indicates that there is a strong force pushing below 2.0 when in fact it has about a 1/5th of the volume as 2.6 in this snapshot. It is worth pointing out at this point that the X axis isn't fixed and can move beyond 2.0 and 4.6 substantially, that's the point. I'm trying to forecast where moves are more likely to happen next.

I think just taking a simple line of best fit ignores a lot of information that this graph is showing. What I thought next was to use some form of advanced distribution model but am unsure which would be of most use so thus my post here to gather a few expert's thoughts. Do I use: Binomial, hypergeometric, normal, poisson, Chi-squared?

In basic terms, what do you think is the best mathematical tool to use to predict whether I think the price is going to move left, right or stay the same in the near future? Why do you think this and is there any more information that I need to supply to make it more accurate?

Cheers,

CPerry.
 
This is likely to be a very unsatisfactory answer.

As I understand it, you have two variables for which you have data points. You want to find what is the correlation between the two variables.

First, there may not be a stable correlation so any apparent relationship that you find may be misleading because it will be only temporary. A may be in love with B this year but despise B next year: divorces happen.

But you may have reason to believe (or to hope) that the relationship between your variables is reasonably stable. There are literally an infinite number of possible relationships. You have implicitly assumed (or rather Excel has implicitly assumed without telling you) that the relationship is linear. As you quite correctly have noticed, the empirical evidence does not strongly support that assumption. The limited amount of data that you have shown does not look at all linear.

This is why theory is important. If you have a theory about the relationship, you can use the data to test the theory. It may be wrong, but you are not looking for a needle that may not even exist in an infinite hay stack.
 
Last edited:
Hi JeffM, thank you for your reply!

I have money entering a market quite quickly at different prices. This money decays over a time period of 30 seconds so the graphs won't keep getting larger, they should stay relatively the same size depending on the consistency of the rate that money enters the market (Imagine the floor has a tiny hole in it and the bars are made up of sand so it operates like a sand-timer). So I have the prices along the X axis and the amount of money being spread across different bars in my Y axis.


I don't think I'm looking for the correlation between the two variables, rather how previous distributions of the money may affect future moves by examining the rate of change. I am testing the stability of this constant change and looking for clues (answers for which I have no questions for yet). I theorise the relationship is not sporadic at this point, enough to be worth analysing. I'm now imagining a horizontal bar along the top of the graph (perhaps showing all averages and quartiles etc). If there's more area on the left then future moves may be more likely to move to the left.

Does this sound like a good approach to you? It should show a truer picture than a simple trend line as it accounts for all of the money shown in the different bars. Either this or I need to make a distribution curve and somehow track whether the mean line is to the left or to the right of a stationary midpoint to indicate momentum in one direction or the other? Would the amount of standard deviations play a part? Which approach do you think will more accurately show momentum/pressure points of my moving data over a period of time? As you can tell I am very much still in the theoretical stage at the moment and am just envisioning the best way to capture my results to be analysed after.

Cheers,

CPerry.​
 
To build upon my last post, what about if I took the cumulative frequency of the ‘Total Time Decay’ and then created a box and whisker plot from here?

Taking the Q1, Median and Q3 from here I get the following numbers:
Q1 = 15.6892473
Q2 = 18.2301075
Q3 = 19.3483871

Here is the example in a table along with a cumulative frequency chart and the original bar chart... Screen Shot 2018-09-06 at 12.11.08.jpg
(I had to take a screenshot as this forum doesn't accept .xls files?)

Using some simpler dummy numbers, say Q1, Q2 and Q3 came out as: 16, 20, 22. There is a 50% stronger pull to the left (difference of Median to Q1 [4] compared to difference of Median to Q3 [2]). So this hypothetically indicates pressure for future money (x axis) to ‘more likely’ appear to the left than to the right? This is what I would like to examine.

BUT this pull in either direction is away from the median not the current price (the latest and most influential in this snapshot is at 2.6. Surely I’m trying to examine the pull away from the current price over time. Is it the current price, the mean, median or largest value that should be my middle point? Any idea how I could pull this off and express it as a single, easy-to-track number or percentage?

Does my logic make sense? I think this works more like a seesaw measuring the pull either way by including all the data.

CPerry.
 
I have no idea how or why money decays, but that sort of question is not one that relates to mathematics at all.

As a matter solely of descriptive statistics, which is not the same as a predictive model, you seem to be interested in the skewness (and perhaps the kurtosis) of an empirical distribution. I stress that your data are empirical because whether one of the standard theoretical distributions reliably describes your data is unknown. What the concepts of skewness and kurtosis are is briefly explained here:

https://en.m.wikipedia.org/wiki/Skewness

https://en.m.wikipedia.org/wiki/Kurtosis

Descriptions are not predictions. A prediction for the future from past numbers is based on the assumption that, for some reason, the numbers are generated by a process that cannot alter rapidly. If that assumption is incorrect, any prediction based on that incorrect assumption may be extremely misleading.

EDIT: You might want consider the assumption of reversion to the "mean." If current behavior is not representative of typical behavior as exemplified over an extensive body of past data, then the prediction is that the current behavior will not persist. If the annual increase in stock prices over the last 140 years has been measured as an 8% geometric mean and this year it is proceeding at a 28% rate, the prediction is that, within a few years, price increases must fall below an 8% rate or even become negative for that 8% long-term average to be maintained. The "mean" in this assumption is not necessarily something as simple as an arithmetic mean; it could mean anything that has proved to be stable on average over a long period of time such as the relative standard deviation or some measure of skewness.
 
Last edited:
Thank you so much for such a detailed and informative reply. I now have a better understanding of where to look and what to consider. You have been very helpful Jeff, it is much appreciated mate :)
 
Top