Need help finding an equation that is a best fit for some data points

Steve Cannon

New member
Joined
Jun 26, 2017
Messages
2
Sorry if this should go into another category. I didn't know where to put it.

I am tracking some events that happen throughout the day and using Excel trendlines to give me an equation that best fits the data points. I have modeled the events up to 13:00 with a 6th order polynomial equation with good results. But after 13:00 the frequency of the events taper off until 15:00. And I was trying to track what percentage of the events have happened between the hours of 1300 and 1500 using a logarithmic trendline.

If you're not an excel user, time in excel are expressed in decimal form between 0 and 1. At midnight Excel represents the time with a 0. At 6 AM Excel uses .25 to represent the time. Noon is .5 and 6 PM is .75, etc.



My data points are:


0.934940.987810.992370.996000.997590.998830.999280.999671.00000
0.541670.552080.562500.572920.583330.593750.604170.614580.62500


The second line are decimal times from 1300 to 1500.

The first line shows what percentage of the events have taken place by the matched time.

So for example, 1500 (.625 in decimal) is the end of the day; therefore, 100% of the events will have taken place by then.

At 1300 (.54167) 93.494% of the events will have taken place.

My problem is that when I select a trendline, the logarithmic version is a TERRIBLE fit. Maybe it's a bug in Excel but no excel expert can find a problem. They all say that I must be doing something wrong in the math.

Excel gives me the equation is: y = 0.296ln(x) + 1.1495
and the R^2 is just 0.4813

I recognize that the data points I've given are not a logarithmic series, but I thought that would give the best R^2. My other excel choices are polynomial (a sixth order equation gives a better R^2 but still doesn't fit the data well. Also I can choose exponential and power. But it just seems like logarithmic should give the best equation.

It's been 30 years since I've done this kind of math. I've always just let excel do it for me. But I must not be understanding something. Is there something mathematical that I am doing wrong?

Thanks!

Steve
 
I am tracking some events that happen throughout the day and using Excel trendlines to give me an equation that best fits the data points. I have modeled the events up to 13:00 with a 6th order polynomial equation with good results. But after 13:00 the frequency of the events taper off until 15:00. And I was trying to track what percentage of the events have happened between the hours of 1300 and 1500 using a logarithmic trendline.

If you're not an excel user, time in excel are expressed in decimal form between 0 and 1. At midnight Excel represents the time with a 0. At 6 AM Excel uses .25 to represent the time. Noon is .5 and 6 PM is .75, etc.

My data points are:


0.934940.987810.992370.996000.997590.998830.999280.999671.00000
0.541670.552080.562500.572920.583330.593750.604170.614580.62500


The second line are decimal times from 1300 to 1500.

The first line shows what percentage of the events have taken place by the matched time.

So for example, 1500 (.625 in decimal) is the end of the day; therefore, 100% of the events will have taken place by then.

At 1300 (.54167) 93.494% of the events will have taken place.

My problem is that when I select a trendline, the logarithmic version is a TERRIBLE fit. Maybe it's a bug in Excel but no excel expert can find a problem. They all say that I must be doing something wrong in the math.

Excel gives me the equation is: y = 0.296ln(x) + 1.1495
and the R^2 is just 0.4813

I recognize that the data points I've given are not a logarithmic series, but I thought that would give the best R^2. My other excel choices are polynomial (a sixth order equation gives a better R^2 but still doesn't fit the data well. Also I can choose exponential and power. But it just seems like logarithmic should give the best equation.
Since you already know that a logarithmic regression appears to be a dreadful fit, on what basis have you determined that a logarithmic regression "should give the best equation"? What is the rest of the information for this exercise, which is leading you to this conclusion?

Please be specific. Thank you! ;)
 
Well, if you want me to be specific, then I'll have to embarrass myself. Excel shows pictures of sample curves. A logarithmic trendline has a steep slope at first then the slope flattens out. My data points have a steep slope at first then levels out.

Yep, that was the full genius of my logic.

Actually, I once minored in math but that was a long time ago. I don't remember anything but apparently my hunches still work. And, on a hunch, a few days after I posted, I threw out the first data point, divided 1 by the other data points, and was able to find a third order polynomial equation with a R2 of .999. I then inverted that equation to get back to my original data points. Don't know why it works. Don't care. I came up with a different equation to estimate the first data point that I threw out and, other than being ashamed at myself, I am pretty satisfied with the solution.

If you're interested, the equation that fits the inverted data points is:

y = -50.8844192626x^3 + 93.2544956698x^2 - 56.9944505190x + 12.6169264712

Thanks.




Since you already know that a logarithmic regression appears to be a dreadful fit, on what basis have you determined that a logarithmic regression "should give the best equation"? What is the rest of the information for this exercise, which is leading you to this conclusion?

Please be specific. Thank you! ;)
 
Top