Jake Worth

Jake Worth

Prefer Real Data For Software Development

Published: August 18, 2022 • Updated: March 21, 2023 2 min read

  • data

When testing user input or data presentation, I prefer using realistic rather than random data. For a recipes app, that would mean adding an ingredient called “Carrot” instead of “asdf.”

This might sound like something everyone agrees on, but in practice, it isn’t.

Using Unrealistic Data

The antipattern I’m describing here: when testing forms, data presentation, or even seeding data, a programmer will enter gibberish or insider-y jargon instead of something more like a user would enter.

An example might be an address form, where the programmer types some combination of the letters “asdf” for each field. In this case, I think it’s better to enter “123 Milwaukee Ave.” or similar into the first address field, “Chicago” in the city field, and so on.

Or in a CMS data field for a page header, a programmer might type “Jake is Testing the Header”. In this case, I think it’s better to put “Dashboard”, or whatever would be appropriate in the context.

Or when seeding a database, setting customer’s email to “foo@bar.com”. Here, it’s trivial to generate a series of realistic emails like “cyrus-1999@example.com”, and it makes the data a more valuable development asset.

Why Bother?

Why does this matter? I’ll allow it’s a subtle preference. But I’d make three arguments in its favor:

  • Realistic data stresses the software in realistic ways
  • It’s easier to work with
  • It presents a more polished environment

Realistic Stress

Real data stresses the software realistically. Perhaps your UI doesn’t display gibberish well: you entered a three-character email address into the form, and now the page looks broken. Is that worth solving? Probably not. No customer is going type ‘abc’ for their email, and our validations don’t allow that anyway. It’s a condition a customer will never experience. But by entering the bad data, you’re now thinking about it.

Or you’re filling and address form, and the address validator says that “abc” isn’t a valid street address. Is considering this problem in development, or worse, in a demo, worth your time? No.

Easier to Work With

Real data is more pleasant to work with. I prefer to look at a website, even in development, with data that looks real. It helps me understand the experience and empathize with my users. Development environments full of keyboard-smashing is a broken window for me.


Lastly, real data feels polished. I share a lot of screenshots with my team. I prefer that those images, and those of my teammates, look real. You never know where a screenshot from your development environment might end up. I’ve participated in more than a few demos where a stakeholder has derailed the presentation with a question like “hold on; we have a saved recipe called ‘Hotfix Test Please Work’? We need to fix that ASAP.”

Wrapping Up

A counterargument is that it takes a little more time to think of something realistic to type. I think that tiny amount of time is worth it. It becomes a habit. Once you’ve spent a few hours on a bug report with a title like “The homepage is broken” and find out the root cause was a person replacing a CMS header value with nonsense, and I think you’ll agree.

What are your thoughts on test data? Let me know!

Join 100+ engineers who subscribe for advice, commentary, and technical deep-dives into the world of software.