Writing an expression based on data set

rmart100 · Sep 10, 2020

Hey I'm sure this is actually quite simple, but I'm having trouble figuring out how to approach this.

I have 2 data sets of related numbers with diminishing returns, so I know it's exponential. I want to write an expression that illustrates the relationship between these two data sets, but I'm failing pretty miserably.

Data set 1 contains: 2500, 5200, 7700
Data set 2 contains: 50, 66, 75 (These are percentages)

I assumed since visually the curve would appear to have a limit of 100, I'd be dealing with a squared function (Incorrect thinking, but I digress). 50 happens to be a perfect square of 2500, so that checks out as x^2. But 66 is not the perfect square of 5200, so there's a coefficient I need to account for that invalidates this as being a function of the 2nd power.

So right now I'm looking aX^b=Y Where X is data set 2 and Y is data set 1. How would I solve for a or b?

Edit: After a bit more consideration, I suppose the answer for b could be limited to odd numbers, because a negative X should cause Y to mirror over the X axis. I'd really like to be able to move away from graphic visualization when considering these things.

Deleted member 4993 · Sep 10, 2020

rmart100 said:
Hey I'm sure this is actually quite simple, but I'm having trouble figuring out how to approach this.

I have 2 data sets of related numbers with diminishing returns, so I know it's exponential. I want to write an expression that illustrates the relationship between these two data sets, but I'm failing pretty miserably.

Data set 1 contains: 2500, 5200, 7700
Data set 2 contains: 50, 66, 75 (These are percentages)

I assumed since visually the curve would appear to have a limit of 100, I'd be dealing with a squared function. 50 happens to be a perfect square of 2500, so that checks out as x^2. But 66 is not the perfect square of 5200, so there's a coefficient I need to account for that invalidates this as being a function of the 2nd power.

So right now I'm looking aX^b=Y Where X is data set 2 and Y is data set 1. How would I solve for a or b? I'm sure I could come reasonably close if I just plug in random numbers for 2 < b >1 but that feels dirty. Am I not thinking of this correctly, or do I need more information?

Since you expect the nature of the relationship to be:

Y = aX^b Where X is data set 2 and Y is data set 1

I'll rewrite this to:

ln(Y) = ln(a) + b * ln(X)

If we plot ln(Y) vs. ln(X) - we will get a st. line with slope 'b' and 'ln(Y)'-intercept of 'ln(a).

continue.....

rmart100 · Sep 10, 2020

Hm... Okay... So we're using ln rather than log because the base is unknown (I'm sure there's a grander rationale, but comes with foundational knowledge I don't have). The assumption of Y = aX^b is based on the conjectures written previously, though I don't have reason to believe it to be any other formula.

Plotting ln(Y) and ln(X), I got (3.912, 7.824) and (4.190, 8.556) rounded to the thousandth. The slope of that straight line comes out to 2.633. I'm not sure if 'b' is equivalent to the b of the original equation or the conventional use of b in a linear equation, but assuming it is of the original I can write out:

a50^2.633 = 2500
a50 = 2500yroot(2.633)
a50 = 19.5218
a = 0.39

And this number checks out against the second data point though the accuracy is a bit off due to cutting it off at the thousandths place. I can refine it from there to a more exact point.

Still, I feel like I skipped some crucial steps there, and I'm not particularly happy that I used a calculator for the odd root equation, but I suppose a calculator would have been necessary regardless at my level.

EDIT: After redoing it for more accurate terms, I found that the 3rd data point is significantly off, so I must have made an incorrect assumption somewhere. Updated terms are as follows:

b = 2.637911
a = 0.388282

These two terms work when X is 50, or 66, but at 75 I'm off by 414.605

Dr.Peterson · Sep 10, 2020

rmart100 said:
I have 2 data sets of related numbers with diminishing returns, so I know it's exponential. I want to write an expression that illustrates the relationship between these two data sets, but I'm failing pretty miserably.

Data set 1 contains: 2500, 5200, 7700
Data set 2 contains: 50, 66, 75 (These are percentages)

I assumed since visually the curve would appear to have a limit of 100, I'd be dealing with a squared function (Incorrect thinking, but I digress). 50 happens to be a perfect square of 2500, so that checks out as x^2. But 66 is not the perfect square of 5200, so there's a coefficient I need to account for that invalidates this as being a function of the 2nd power.

So right now I'm looking aX^b=Y Where X is data set 2 and Y is data set 1. How would I solve for a or b?

Edit: After a bit more consideration, I suppose the answer for b could be limited to odd numbers, because a negative X should cause Y to mirror over the X axis. I'd really like to be able to move away from graphic visualization when considering these things.

A number of things make this hard to follow, starting with the fact that you put your "x" after your "y", and then said that 50 is "a perfect square of 2500", when it is in fact the square root.

When I plot your three points, taking "data set 2" as x and "data set 1" as y, the relation does look approximately exponential, though I don't know what you mean about "a limit of 100", or how this suggests "diminishing returns", or what you could mean about odd numbers. You seem to be confusing exponentials, y = a*b^x, with power functions, y = a*x^b.

But here is what I get when I put the data into Excel and ask for an exponential trend line:

You'd need a lot more data to confirm an exponential relationship; even these three points don't fit exactly, as R² is not 1. Their equation is equivalent to y = 263.11*1.046132^x.

If I ask for a power function rather than an exponential, to follow the form you suggested, I get this:

That fits a little less closely. Again, more data would reveal whether either of these is appropriate.

JeffM · Sep 10, 2020

I am going to take a different approach from Subhotosh Khan.

First, unless you have some very strong theoretical basis for favoring a particular kind of mathematical relationship between your variables, three data points are grossly inadequate. There are an infinite number of possible relationships among three data points. Can you get more data?

Second, you seemed to be talking about a quadratic relationship but said exponential. They are quite different.

Third, there may be qualitative constraints that can supplement actual data. Must one variable be positive or non-negative. If one variable can be zero, do you have an idea what that means about the other variable?

Fourth, there may be more than two variables involved.

EDIT: Having read Dr. Peterson's great reply, I have to reiterate that with this little data, closeness of fit means almost nothing. You can get a perfect fit using a quadratic, but the result is meaningless.

rmart100 · Sep 10, 2020

Dr.Peterson said:
A number of things make this hard to follow, starting with the fact that you put your "x" after your "y", and then said that 50 is "a perfect square of 2500", when it is in fact the square root.

When I plot your three points, taking "data set 2" as x and "data set 1" as y, the relation does look approximately exponential, though I don't know what you mean about "a limit of 100", or how this suggests "diminishing returns", or what you could mean about odd numbers. You seem to be confusing exponentials, y = a*b^x, with power functions, y = a*x^b.

But here is what I get when I put the data into Excel and ask for an exponential trend line:

View attachment 21546

You'd need a lot more data to confirm an exponential relationship; even these three points don't fit exactly, as R² is not 1. Their equation is equivalent to y = 263.11*1.046132^x.

If I ask for a power function rather than an exponential, to follow the form you suggested, I get this:

View attachment 21547

That fits a little less closely. Again, more data would reveal whether either of these is appropriate.

Sorry for the poor presentation and word choice on my part. I misspoke on the relationship of 50 and 2500. I presented the data sets Y and X simply because of stream of thought rather than any intentionality. I assumed the limit to be 100 because I know 100% is unachievable and would only require larger and larger inputs of Y for an infinitesimally smaller return. And I referred to odd numbers on a stupid observation that odd powered equations graph out with an inverse of the workable numbers (i.e. -5200 returning -60%). Though I suppose there's nothing in the observable data that solidifies either that 100% is unachievable or that -100% would conversely be the floor.

JeffM said:
I am going to take a different approach from Subhotosh Khan.

First, unless you have some very strong theoretical basis for favoring a particular kind of mathematical relationship between your variables, three data points are grossly inadequate. There are an infinite number of possible relationships among three data points. Can you get more data?

Second, you seemed to be talking about a quadratic relationship but said exponential. They are quite different.

Third, there may be qualitative constraints that can supplement actual data. Must one variable be positive or non-negative. If one variable can be zero, do you have an idea what that means about the other variable?

Fourth, there may be more than two variables involved.

EDIT: Having read Dr. Peterson's great reply, I have to reiterate that with this little data, closeness of fit means almost nothing. You can get a perfect fit using a quadratic, but the result is meaningless.

So this came about out of curiosity about mechanics in a game I play. Besides the optimization in the game it affords me, I know if I can understand the principles behind it I'll explore it's applications in other areas in work or school where I'd like to aggregate data. As a disclaimer, I would never haphazardly use information I don't thoroughly understand as a basis for professional decisions, but I do like to tinker with it on side projects to see how reality holds up with those predictions.

With the data provided by the community I saw that those 3 data points relate to the reduction of damage and necessary points to achieve said reductions. 100% reduction in games is generally unachievable by virtue of rounding and requiring higher point investment for the next threshold, so I'm inclined to believe 100% is an unachievable limit, while 0% being the floor isn't something I can necessarily test for, but am highly confident in.

As for the second point, I can't say I honestly know the qualitative differences between quadratic relationships and exponential. I suppose I always assumed exponentials were a subset of quadratics.

Third point is partially answered by my initial response, though if one variable is zero the other should respectively be zero.

Fourth; that is a possibility. But given my limitations on data I would like to presume that there is no other variables involved, or that they present negligible contributions to the majority of the discrete range.

PS. I know this is, in light of the origin of the problem, incredibly low priority compared to other members questions for practical or school-related application. Games just give me a medium where I can apply and ponder mathematical quandaries without real consequences, and have given me an opportunity to explore concepts I couldn't appreciate in Highschool.

Dr.Peterson · Sep 10, 2020

rmart100 said:
So this came about out of curiosity about mechanics in a game I play. Besides the optimization in the game it affords me, I know if I can understand the principles behind it I'll explore it's applications in other areas in work or school where I'd like to aggregate data. As a disclaimer, I would never haphazardly use information I don't thoroughly understand as a basis for professional decisions, but I do like to tinker with it on side projects to see how reality holds up with those predictions.

With the data provided by the community I saw that those 3 data points relate to the reduction of damage and necessary points to achieve said reductions. 100% reduction in games is generally unachievable by virtue of rounding and requiring higher point investment for the next threshold, so I'm inclined to believe 100% is an unachievable limit, while 0% being the floor isn't something I can necessarily test for, but am highly confident in.

If the part about 100 means that you expect x=100 to be unachievable (suggesting a vertical asymptote), then clearly the relationship can be neither exponential not polynomial. One possibility would be that 100-x would be an exponential function of y, so that y would be a logarithmic function of 100-x. Excel, with a little arm-twisting, gives me y = -7481 ln(100-x) + 31709 as a possibility.

But really, to find a formula you need a theory, not just a few numbers; you've told us some of the background, but we'd need to know a lot more in order to guess at a formula. It looks to me like you are, in part, hoping to reverse-engineer this feature of the game, and I know nothing at all about it.

JeffM · Sep 10, 2020

Well, if 0 entails 0, you can reject a strict exponential right off the bat.

Moreover, with a bit of work I think I can show that a quadratic will not work.

That's the advantage of having more data to work with.

It is interesting that 50^2 = 2500. Assuming the game uses a relatively simple function, a few more data points might make it possible to find a reasonable approximation.

By the way, is it possible that 50, 66, and 75 are crude proxies for 1/2, 2/3, and 3/4. That might make the arithmetic easier to deal with.

rmart100 · Sep 10, 2020

Dr.Peterson said:
If the part about 100 means that you expect x=100 to be unachievable (suggesting a vertical asymptote), then clearly the relationship can be neither exponential not polynomial. One possibility would be that 100-x would be an exponential function of y, so that y would be a logarithmic function of 100-x. Excel, with a little arm-twisting, gives me y = -7481 ln(100-x) + 31709 as a possibility.

But really, to find a formula you need a theory, not just a few numbers; you've told us some of the background, but we'd need to know a lot more in order to guess at a formula. It looks to me like you are, in part, hoping to reverse-engineer this feature of the game, and I know nothing at all about it.

Ah right vertical asymptote was the word I grasping for. Again, I apologize for my poor wording of the problem. My understanding is a mix of patchy memory, incomplete direction in highschool as the need-to-know basis of most curricula, and attempts at self re-education now. I am curious, though, beyond a larger data set, what else would be needed to be a workable theory? Reverse engineering the game is in essence what I'm doing here, but I absolutely enjoy the greater understanding of why this is this and that is so. I'd pursue more textbooks to better understand the how, but I'm afraid it'll never teach me the why, so I would very likely not think to apply a majority of the things I do pick up with self teaching.

JeffM said:
Well, if 0 entails 0, you can reject a strict exponential right off the bat.

Moreover, with a bit of work I think I can show that a quadratic will not work.

That's the advantage of having more data to work with.

It is interesting that 50^2 = 2500. Assuming the game uses a relatively simple function, a few more data points might make it possible to find a reasonable approximation.

By the way, is it possible that 50, 66, and 75 are crude proxies for 1/2, 2/3, and 3/4. That might make the arithmetic easier to deal with.

The crude proxy idea sounds possible. Though the numbers that relate to them seem arbitrary. 2500 to 5200 for an increase of 16%? That's why I was hoping to find a formula that could illustrate it, as the Y data set would have some intermediary X value.

JeffM · Sep 10, 2020

I am not sure about asymptotic behavior because we may simply be dealing with a domain issue. It is reasonable to assume that percentages are neither exceeded by zero nor exceed 100. But that is helpful only if the independent variable is in data set 2, which is the way Dr. Peterson did his graphs. That does not seem consistent with your initial remark about decreasing returns. If data set 2 is the dependent variable, we presumably have two horizontal asymptotes or a function defined piecewise.

rmart100 · Sep 10, 2020

JeffM said:
I am not sure about asymptotic behavior because we may simply be dealing with a domain issue. It is reasonable to assume that percentages are neither exceeded by zero nor exceed 100. But that is helpful only if the independent variable is in data set 2, which is the way Dr. Peterson did his graphs. That does not seem consistent with your initial remark about decreasing returns. If data set 2 is the dependent variable, we presumably have two horizontal asymptotes or a function defined piecewise.

How do you figure it's two horizontal asymptotes? Data set 2 has values 0 - 100 and runs along the x axis, so shouldn't that be two vertical asymptotes at 0 and 100, and 1 horizontal asymptote at 0. I believe data set 2 is in fact the dependant variable

JeffM · Sep 10, 2020

rmart100 said:
How do you figure it's two horizontal asymptotes? Data set 2 has values 0 - 100 and runs along the x axis, so shouldn't that be two vertical asymptotes at 0 and 100, and 1 horizontal asymptote at 0. I believe data set 2 is in fact the dependant variable

The usual convention is that the dependent variable is plotted on the vertical axis so the relevant asymptotes would be parallel to the horizontal axis.

rmart100 · Sep 10, 2020

JeffM said:
The usual convention is that the dependent variable is plotted on the vertical axis so the relevant asymptotes would be parallel to the horizontal axis.

Ah, right I see your point. Yes in that case I totally understand. I only put the dependant variable on the x axis because it's easier for me to conceptualize a curve in that direction.

Dr.Peterson · Sep 10, 2020

rmart100 said:
Ah right vertical asymptote was the word I grasping for. Again, I apologize for my poor wording of the problem. My understanding is a mix of patchy memory, incomplete direction in highschool as the need-to-know basis of most curricula, and attempts at self re-education now. I am curious, though, beyond a larger data set, what else would be needed to be a workable theory? Reverse engineering the game is in essence what I'm doing here, but I absolutely enjoy the greater understanding of why this is this and that is so. I'd pursue more textbooks to better understand the how, but I'm afraid it'll never teach me the why, so I would very likely not think to apply a majority of the things I do pick up with self teaching.

Frankly, a big part of the problem here is that you are trying to express the problem in mathematical terms, but thereby are speaking a language you don't know well enough. The best way to ask about something like this is to state the problem in its own terms, and leave the math to the people who are answering. Then we can try to explain things in a way that both shows you the math and uses terms you can follow.

JeffM said:
I am not sure about asymptotic behavior because we may simply be dealing with a domain issue. It is reasonable to assume that percentages are neither exceeded by zero nor exceed 100. But that is helpful only if the independent variable is in data set 2, which is the way Dr. Peterson did his graphs. That does not seem consistent with your initial remark about decreasing returns. If data set 2 is the dependent variable, we presumably have two horizontal asymptotes or a function defined piecewise.

It was clearly stated that data set 2 is x. I suspect what is happening here is that the data really reflect data set 1 being the independent variable, but the goal is to invert the relation, finding what input value will produce a given output (data set 2, apparently the percent reduction). So the "diminishing returns" refers to the original function, not the inverse we are looking for.

rmart100 said:
How do you figure it's two horizontal asymptotes? Data set 2 has values 0 - 100 and runs along the x axis, so shouldn't that be two vertical asymptotes at 0 and 100, and 1 horizontal asymptote at 0. I believe data set 2 is in fact the dependant variable

You're being inconsistent here. In your question, you want data set 2 to be the independent variable (input), on the horizontal axis; but in the game, it is the dependent variable (output). Again, this is a result of your apparently wanting the inverse (reversing the problem), and possibly of your using terms you aren't sure of.

We really know nothing about any asymptotes at all; but I suggested an asymptote (only one) because of your "diminishing returns" idea.

rmart100 · Sep 11, 2020

Dr.Peterson said:
Frankly, a big part of the problem here is that you are trying to express the problem in mathematical terms, but thereby are speaking a language you don't know well enough. The best way to ask about something like this is to state the problem in its own terms, and leave the math to the people who are answering. Then we can try to explain things in a way that both shows you the math and uses terms you can follow.

It was clearly stated that data set 2 is x. I suspect what is happening here is that the data really reflect data set 1 being the independent variable, but the goal is to invert the relation, finding what input value will produce a given output (data set 2, apparently the percent reduction). So the "diminishing returns" refers to the original function, not the inverse we are looking for.

You're being inconsistent here. In your question, you want data set 2 to be the independent variable (input), on the horizontal axis; but in the game, it is the dependent variable (output). Again, this is a result of your apparently wanting the inverse (reversing the problem), and possibly of your using terms you aren't sure of.

We really know nothing about any asymptotes at all; but I suggested an asymptote (only one) because of your "diminishing returns" idea.

Right, you got it all in there. As JeffM clarified for me, the y axis is conventionally used for the dependent variable but I plotted it backwards due to not being aware of that convention. Data set 2 is the dependent variable and I should have assigned it the y axis.

Writing an expression based on data set

rmart100

New member

Deleted member 4993

Guest

rmart100

New member

Dr.Peterson

Elite Member

JeffM

Elite Member

rmart100

New member

Dr.Peterson

Elite Member

JeffM

Elite Member

rmart100

New member

JeffM

Elite Member

rmart100

New member

JeffM

Elite Member

rmart100

New member

Dr.Peterson

Elite Member

rmart100

New member