Thank-you...thank-you....thank-you for a great article addressing Over-Datafication ( a term I coined several years ago). I worked in the Pulp and Paper Industry for 47 years. It always amazed me that vendors of data collection systems often used ferquency of data as a selling point. "Wouldn't it be fantastic if we could get a measurement every 10 seconds instead of every 10 minutes?". Of course, this comes five years after they sold us the current system with "Wouldn't it be fantastic if we could get a measurement every 10 minutes instead of once per hour?" etc., ect., etc., ad nauseum. Over-Datafication occurs when the data is collected much faster than the process can change and the data becimes severely auto-correlated, making it useless for analysis of the "voice of the process". I once mentored a Six-Sigma Black Belt candidate doing a project in a pulp mill. He was taking automated data (temperature) of a process stream which needed to be controlled within a few degrees. The data collection frequency was every 30 seconds. However, the Control Chart based on this data turned out to be useless in helping the operators control the temperature of this process stream. It turned out that the data was very auto-correlated - the data were nowhere near independent of each other. We changed the data frequency to every half hour and the Control Chart became a great tool for the operators. Temperature variability was reduced over 50% and temperature excursions outside the specification limits became almost non-existant. Due to Over-Datafication, many industries are drowning in data, but starved of information. Data is useless if it cannot be turned intoo information. Note: Auto-correlated data makes Control Charting very difficult because the calculated upper and lower control limits become way too tight and most of the data is outside thos limits.
Thank-you...thank-you....thank-you for a great article addressing Over-Datafication ( a term I coined several years ago). I worked in the Pulp and Paper Industry for 47 years. It always amazed me that vendors of data collection systems often used ferquency of data as a selling point. "Wouldn't it be fantastic if we could get a measurement every 10 seconds instead of every 10 minutes?". Of course, this comes five years after they sold us the current system with "Wouldn't it be fantastic if we could get a measurement every 10 minutes instead of once per hour?" etc., ect., etc., ad nauseum. Over-Datafication occurs when the data is collected much faster than the process can change and the data becimes severely auto-correlated, making it useless for analysis of the "voice of the process". I once mentored a Six-Sigma Black Belt candidate doing a project in a pulp mill. He was taking automated data (temperature) of a process stream which needed to be controlled within a few degrees. The data collection frequency was every 30 seconds. However, the Control Chart based on this data turned out to be useless in helping the operators control the temperature of this process stream. It turned out that the data was very auto-correlated - the data were nowhere near independent of each other. We changed the data frequency to every half hour and the Control Chart became a great tool for the operators. Temperature variability was reduced over 50% and temperature excursions outside the specification limits became almost non-existant. Due to Over-Datafication, many industries are drowning in data, but starved of information. Data is useless if it cannot be turned intoo information. Note: Auto-correlated data makes Control Charting very difficult because the calculated upper and lower control limits become way too tight and most of the data is outside thos limits.