So recently I have been thinking about data science and how much writing I have been doing recently in blogging for work, blogging about football, and blogging about random things. What I has been interesting to observe is how writing and data science complement each other. I figured I would write a quick blog post for anyone in an analytic profession describing what publishing or writing down your work does to help complement your analytic capabilities. I am putting code comments as part of “writing” because in a lot of cases my write-up is started by an outline from my code comments. So here are a few ways that I think it helps tremendously:
It helps you finish projects
In a lot of cases it is easy to find quick nuggets of information in data. It’s a lot harder to talk about what those nuggets mean and why they are significant. Also, does that nugget lead to another interesting nugget?
A lot of times a completed project exists in the form of code that you wrote that not a lot of people can understand besides you. Comments in your code are important to understand the code. Write-ups are important to understand the findings you got from the model you used.
It gives you a baseline to beat
There have been a lot of times where I never feel like a project is finished because I can constantly do x, y, or z better. While this doesn’t necessarily get remedied by writing things up directly, I feel like if you have regular weekly write-up deliverables for yourself, you are more likely to write down what you have. In a lot of cases what you have isn’t a finished product in statistics. There are so many models to try, so many transforms to do, and so many other omitted variables to add into the fold. As George Box said “All models are wrong, but some are useful.” It’s almost as important to get a baseline out quickly as it is to get the best model out eventually. It gives you something to beat. It also gives you something to show. Important to emphasize in the writeup that this is just a baseline and more analytics should be done before this is taken seriously. Be aware that project managers and executives will be ready to roll with whatever you say. It’s on you to make the model better.
It gives you a clear outline on where to go
In a lot of cases the process of writing up your findings will make you be more critical in assessing model violations as well as giving you a place to go. There is almost always a conclusion/next steps portion to a writeup. When you write something up you spend time trying to explain why you used what to yourself and others. I think the old adage that you don’t know a subject until you can teach it holds here. You don’t really know the data until you have to write about it. I have never finished a writeup thinking “that’s it, I’m done.” The process usually helps me think of questions that I didn’t have while writing the code, violations in the model assumptions I may have made, or other pieces of data that I might want to add to the analysis.
It helps you communicate to anyone that relies on you for insights
This one is really important. This is also one of the biggest reasons I created this blog. Complex models are cool, but being able to explain complex models in a straight-forward manner is extremely valuable. Scratch that. Being able to explain ANY statistical concept in a straight-forward manner is extremely valuable. Not everyone you are going to talk to will have a masters or PhD in statistics. Being able to clearly communicate your findings is a skill, not a given. It has been my experience that it is a skill that many data scientists put little stock in at the detriment of their own careers. The data scientist that can explain advanced concepts to the everyman is the real hero, not the one that gets frustrated that nobody understands him or her. I have also seen scenarios when the correct numbers and verbiage are not used simply because the person that matters in the company didn’t understand what you just told them. Writing stuff up helps you understand how they want to hear what you are trying to say. That sentence might sound weird, but its important. Trust me. We are outliers in our train of thought and approaches. The outlier should never simply be ignored, but investigated (corny stats joke, feel free to ignore).
It helps you understand what value it adds
I have seen countless forays into analytics that are interesting, but what really matters is if they are valuable. You can run the most bomb model in all of models and if it gives a 2% accuracy boost over a linear model, why should anyone care? Outside of the learning I might have had from running that model in the wild, chances are nobody will ever care. There are certain exceptions to this, but unless you are working on the most cutting edge methods, I doubt most of us see much value in a 2% accuracy boost. I think a writeup helps you answer the question “Why is what I did important.”
Anyway, that’s my two cents on the matter. As always feel free to say “You don’t know anything about anything” and move on or bash me in the comments. I welcome your criticism. Maybe you think I missed or overlooked something. I welcome your insights.