Condensing Data, am i doing it right?

Blake

New member
Joined
Aug 18, 2020
Messages
1
Hi,

So I've currently got a database full of records kept about bee hives by Beekeepers. And eventually I'd like to be able to show an trend in whether a number of their metrics (Health, Population Size, number of parasites, presence of a queen bee, etc) have gone up or down over the course of six years.

The problem is, Beekeepers check hives more frequently in Spring than winter, and all check their hives on different days. So my instinct is to condense "weekly numerical data reports" in "Monthly overview/summary" and "Quarterly overview/summary" reports. (Comdensing 4368 random haphazard dates, into 1008 and 336 comparable datapoints)

The real question comes to this; if I take something like 4 consecutive weeks where a metric like "docility" is [10, 10, 6, 10], (10=very, 1=angry) then is it sensible to call the monthly summary score for how docile this hive was = 9. Or am I setting myself up to lose too much nuance over the 6 years of data?

Thanks for any help from anyone! Its greatly appreciated!
 
Wow. Tricky problem. You CAN find a satisfactory solution, but some of it will be difficult to explain to your audience.


1) Sounds like some length of moving average may be beneficial. If it's long enough, you may be able to detect real seasonal differences, rather than just differences due to checking intervals.

2) Don't call things "random" unless you prove they are random. The point of the exercise is to trends and relationships - the very point for not being random. Anyway...

3) You have to decide what data are meaningful to you and to your audience. With 10, 10, 6, 10, you have:
----- Mean = 36/4 = 9
----- Median = 10 (?) Hard to understand with only four data points.
----- Mode = 10 (Happens most often)
----- IQR and General Quartile Reporting. Again, tricky with only four data points.
----- or anything else you can imagine, catalog, and report.

Really, it's up to you what data are important or meaningful.
 
Top