Avoiding Shady Data: Implementing Google Analytics to Report Quality Data

We now know how to deal with existing shady data, but how do we avoid creating shady data in the first place?

In the first shady data blog post we looked at how to deal with existing shady data — that is, dirty, potentially misleading data with mysterious origins. We began to answer the question: How do you do effective, accurate analysis of shady web analytics data?

And while dealing with this data is certainly an essential skill to have, what would be even better is not having to deal with it at all. In a perfect world there wouldn’t be any shady data!

Of course we don’t live in a perfect world. There will always be shady data. But, there are a number of things we can do to minimize it. There are a number of things we can do to create crystal clear data — accurate data with known origins. So how do you work toward this ideal? How do you avoid creating bad data in the first place?

In short, by implementing your analytics as well as you possibly can. In long…well, keep reading.

Have a plan

I’m sure this isn’t a surprise to anyone, but it is impossible to implement a good analytics setup without doing at least some planning. Whether you’re just thinking about changing one setting in Google Analytics or you’re installing an entirely new tracking infrastructure, you need to have a plan before you start changing stuff. That may mean simply reviewing an existing plan to see if the proposed change makes sense, or it may mean starting from scratch to develop a new analytics plan.

If you’re one of the many who doesn’t have an analytics plan, here’s a quick primer on how to come up with one. It generally consists of answering five questions, one after the other:

Step 1: Why does my site exist? What are the goals of my site?

First and foremost, you must understand the purpose of the site(s) you’re working on. This is a no brainer for those who may be setting up analytics from scratch, but even for those who are just thinking about making minor changes, it is vital to understand why the site exists. The purpose of the site is like the North Star — no matter how lost you are, you can look up and know which direction you should be heading. Web analytics master Avinash Kaushik explains the importance of this step and provides a framework for working through it.

Step 2: What questions do I want to answer using analytics?

Once you have a good understanding of the site’s purpose and goals, you can begin to think about why you’re using analytics at all. What are you hoping to get out of analytics? Can you use them to measure the goals you identified in step 1? What questions are you hoping to answer with analytics? Only after you have solid answers to these questions should you proceed to think about tracking specific things.

Step 3: What do I need to track in order to measure these goals and answer these questions?

Once you have answers to these first two questions, you can get more specific about the particular things you need to track. These things might include separate domains or subdomains, particular site elements like videos, or site features like site search. Without thinking too much yet about how you’re going to track them, you should compile a list of things that need to be tracked in order for you to measure your goals and answer the questions you identified in steps 1 and 2.

Step 4: How do I need to track these things?

“Tracking” something can mean a lot of different things. There are a lot of ways to track things — even the same thing — and you need to decide which is best. Decisions you might have to make include:

You have multiple domains or subdomains. For which ones, if any, would you like to be able to see data in the same report? And would you like to treat them as if they’re one site or separate sites? Is it more important to see the behavior of users as they move between sites or their behavior within the sites?
You want to track submissions of a form as a goal. Should you define the goal as an event based on submission of the form or a URL destination based on unique pageviews of the thank you page?
You want to track the carousel on your home page. Should you use events or virtual pageviews?
You want to track users’ scroll behavior. Should you designate your events as non-interactions or not?
You want to track whether users are logged in or not. What level(s) should you set your custom dimensions to?

All of these decisions will directly impact your implementation and therefore the data that you collect. Just about any metric you might care about (e.g. visits, bounce rate, pages per visit, visit duration, exit rate, new/returning visitors, or conversion rate) could be affected by the decisions you make in this step. And when nearly every metric in your reports could be affected by how you implement your analytics, you should know how you’re going to implement your analytics before you implement your analytics!

Step 5: How is the site currently being tracked?

In many cases, if possible, you would like to be able to make comparisons of the new data you’re collecting to the old data you’re collecting. If this is the case, then you’ll want to keep this in mind as you compile your implementation plan. Sometimes there are tradeoffs between tracking something correctly and tracking something in a way that allows for past comparisons — you’ll have to balance those. Examples of decisions you might have to make include:

Whether to make events non-interactions or not
Whether to include new/different domains/subdomains in existing views
Whether to add filters that might change how page data shows up in your reports

Step 6: How will I need to implement analytics configuration changes so that I have the highest quality, least shady data? In other words, what’s my implementation plan?

This is the granddaddy of them all. The answers to all the previous questions inform the answer to this one. Ideally after going through steps 1-5, you’ve created a detailed implementation plan with every single thing you need to do. Exactly what code needs to go where. Exactly which views need to be created or changed. Exactly which filters should be added/changed/removed. Etc.

One other thing to keep in mind as you’re finalizing your implementation plan is that your site will change over time. So, try to set things up so that reasonable changes to your site won’t completely corrupt the data. Make your analytics setup durable by doing things like:

Try to avoid defining goals based on specific link HREFs or link texts — these will likely change. Instead, whenever possible, try to define goals based on static URL destinations or broader types of events.
Try to avoid view filters (or goals, or content groups, or anything really) that are based on complicated regular expressions. If you have to resort to these, make sure you have a view or two that doesn’t rely on them, which could still provide data in the event of a filter not working.

Implement Iteratively

Now that you’ve gone through the entire planning process (or reviewed an existing plan), it’s time to start implementing. You’re ready to start making stuff happen. So how do you implement this wonderful plan of yours?

The key is to do it iteratively. Although you have a comprehensive plan for what your analytics implementation will look like in the end, don’t assume that you’ll be able to do all of the configuration in one shot. A setup of any complexity should be implemented one step at a time. Start with the basics (standard tracking tags in Google Tag Manager, no filters views in Google Analytics, etc.), confirm that they’re working, and then move on to the more complex parts of your implementation (video tracking tags, custom reports in GA, etc.).

As you’re going through this iterative process of setting things up (even if it’s just changing one setting in GA), make sure to do three things:

1.Test stuff. Don’t just assume you did it right. First, test immediately using real-time analytics if possible. Then, once you’re pretty confident it’s working, make the change public and wait a few hours (or until the next day if necessary) and look at the data that’s coming in from real users. This will quickly reveal easy fixes/improvements that can be made. You will almost definitely notice things that you didn’t foresee when you put together your plan or even when you tested things initially.

2.Annotate. As you finalize different elements of your implementation, annotate the changes you made. As I talked about in the first shady data blog post, these notes are incredibly useful when you’re trying to analyze past data. Don’t assume you’ll remember all the changes you made even as little as a week later!

3.Practice analyzing the data and/or pulling reports. A couple of days into your new configuration, to test how truly useful it is, try to answer the questions you identified in step 2. Make sure you are really measuring the goals you want to measure. Pull the reports you’re going to need to pull eventually. Knowing that the sample size is probably too small and you’re not yet going to take action on any of this analysis, try to actually use the data you’re collecting. Test to make sure the data that’s coming in is in the right format and is usable. If you find yourself unable to get the numbers you want then you may need to go back and reevaluate your implementation plan and possibly change some aspects of your configuration.

Be Flexible

Despite all the time and effort you’ve put into your analytics, remember that they won’t be perfect. You’ll make a mistake, or forget something, or make the wrong decision, or something will change. There will be shady data. But don’t worry, you’re flexible. You know how to deal with it.