Find reasonable curves that fit the data

TS-otomn

New member
Joined
Apr 3, 2023
Messages
2
## Background
I'm playing a game called Unrailed! where players (2-4) need to craft and place down rails as the train travels from one station to another (1 level). The stations are 35m apart and the train speed increases in steps as each level is cleared. In addition, the more players in the game the higher the train speed.

## Problem:
The attached txt file (should have been a csv) is the chart of the train speed displayed on the in game HUD from 0m to 1995m. It is impossible to obtain more data beyond what's here. The goal is to find a set of 3 functions [imath]f_i(x)[/imath] that can fit the data where x can be either distance or level count. The three functions must share the same (reasonable) structure but the constants can be different. It would be nice if the constants are some reasonable numbers with no more than 3 significant digits. It would also be nice to find an ultimate formula [imath]f(x,n)[/imath] but I doubt such function even exists (without fancy tricks).

## Rounding:
The exact rounding algorithm used by the game is unknown. Nonetheless, assuming the game uses a continuous function to compute the speed instead of some iterative method, it should be possible to find a function that has a maximum error of less than 0.0005. It is discovered that the game itself is not very consistent at rounding values, so anything with maximum error less than 0.001 is fine.

## My work so far:

### First guess
My first guess is [imath]f_i(x):ax+b+\frac c x[/imath] where x is shifted by a constant s to avoid 1/x being undefined. I think this is a good guess because [math]\lim_{x\to\infty}ax+b+\frac c x=ax+b[/math] which is reasonable from a game design perspective. Using the `Fit` function in GeoGebra while messing around with s, I found it to be hard to find a good fit with low error. In addition, when I try to plot the error as points on the same graph, the pattern is very noticeable. I take this as an indication that the formula is wrong.

(the messy dots represent error multiplied by 100)
Screenshot 2023-04-03 at 22.00.12.png

### Fit the delta
With no luck finding the formula directly, I changed my strategy to find the curve of best fit for the delta values. I discovered that [imath]f'(x):a+b \ln(x)[/imath] seems to fit very well with [imath]s=1[/imath] and x representing the level count. The pattern in the error is still noticeable, but that is likely purely due to rounding. I personally like this one, because at least s is a sane number.

Screenshot 2023-04-03 at 22.10.00.png

### Second guess based on the derivative
Assuming the curve of best fit for the delta values must be the derivative, my second guess is [imath]f_i(x):ax+b+c x \ln (x)[/imath]. I assume since [imath]s=1[/imath] works well with the derivative, it must work well here as well. It turns out that's not the case. The error is minimized with [imath]s=5[/imath] but it fits poorly near the start.

### More terms
At this point, I tried to throw in more terms. [imath]f_i(x):ax+b+c x \ln (x)+dx^2[/imath] seems to fit very well. But at this point I'm not sure if that's just me over fitting the data. If this is the case, the d term must be extremely small compare to the other terms. In addition [math]\lim_{x\to\infty}ax+b+cx \ln (x)+dx^2=dx^2[/math] which is gonna be a bit too harsh to the player in my opinion.
 

Attachments

  • Speed.txt
    1.3 KB · Views: 2
This looks challenging. By a glance, your result is already fairly good. Looks like you’re reverse engineering a black box. based on its behaviour, to figure out what could be one out of infinitely many different ways of making it.
 
This looks challenging. By a glance, your result is already fairly good. Looks like you’re reverse engineering a black box. based on its behaviour, to figure out what could be one out of infinitely many different ways of making it.
Thanks for the reply. Indeed, there are infinitely many functions that can fit a set of points. But knowing that the functions are written by people limits the possible solutions to only ones that have a few commonly used terms. This is like finding the equation for the law of gravity, which I think should be doable.
It is correct that my last attempt [imath]f_i(x):ax+b+c x ln(x) + d x^2[/imath] yields a very promising solution. The max error is only 0.0006. Perhaps I just need someone to convince me that using 4 terms here is not over fitting.
Screenshot 2023-04-04 at 10.38.56.png
 
Top