Getting Only What You Need

by Elizabeth Wetton

30 June 2015

Something that has always frustrated me about the Internet is how much data companies require on sign-up and how little privacy there is regarding that data. Name, email, date of birth, country of residence, you can hardly sign up for any service on the Internet without giving some random company the entirety of your life story. This never used to bother me; I used to be a very open person with my information. I was one of those Internet weirdos who, in the web 1.0 era, used her full name.

However, after I came out as transgender my attitude changed. All of a sudden my data was extremely important. Any information I broadcast could be used to target me or identify me as a potential target. This became immediately apparent the first time I was brought into my manager’s office to be told that someone from the Internet had contacted him about me. Whoops, I guess that maybe being open with my data has come to bite me in the ass.

Due to this I’ve become pensive about sign-up forms and online surveys. Why do free and ad supported services want so much of my data? Why would they want to know where I live? I’ve also become increasingly annoyed with the way that companies ask for data. Why is gender choice so limited? Why is race data so Americentric? I’m skeptical of companies that want users to be open with their data, but obfuscate as to why they want my data.

I know I’m not the only individual who harbors these particular concerns either. There have been several high profile examples of data abuse, as well as companies with archaic policies regarding simple collected data. Far too much data is being requested by companies, most of it is bad or unrepresentative data, and most of it is under-utilized. These aren’t difficult problems; they’re solvable if you pay attention to the desires of your users and needs of your advertisers.

Use your words

If you’re designing a tool to collect data in English, lucky you! English is a robust language and there’s always at least a few ways to say the same thing. However, people often forget this and instead rely on the decades old maxim, “Keep it simple stupid,” which is not only ableist but completely wrong-headed in this situation. KISS has its merits in other areas of design but form and survey design, working with data, is not one of these. Lest we forget that some of the simplest designs are the most complex to navigate. Attempting to ask overly simple questions on a form does the same; the problem of bad data doesn’t simply disappear. Use fewer words, but do not boil down complex issues into entry boxes with a single word descriptor, especially if these concepts are defined socially.

If you’re attempting to boil down survey or form answers to one word, you’re essentially subjecting an individual to the textual equivalent of a Rorschach test and trying to pigeon-hole them into an answer defined by the question. “When you look at this amorphous and vague concept, what do you see?” The most important thing is to never forget that race, gender, and, family name, all carry social connotations depending on where your feet are standing in the world. What is “African American” to a “Jamaican Canadian”? What is “gender” to a non-binary individual? What is family name to a Spanish person? Hence if you truly need that information either leave the question more open ended, “What is your cultural background?” or be very specific, “What gender appears on your license?” If these sorts of questions irk you in any way, chances are you have no business asking them in the first place. Surveys looking for census data should leave questions open ended and collate the data later. Forms that have any sort of legal implication should be very specific otherwise you risk having the incorrect information.

Be Transparent

Often I’ll come across a form that gives absolutely no justification as to why they’re asking for information. At it’s purest a sign up form needs two fields: e-mail and password. The email becomes the user id and the password is used to authenticate the user, easy peasy. All information after these two fields is superfluous and often unnecessary. Certainly name and age could be given a pass but most questions beyond that begin to raise red flags as to what a company will be doing with the information. Gender? Why? Will Rdio suddenly block “Ixnay on the Hombre” because I should’ve been listening to “Backstreet’s Back” at the time? Does the Skype app change it’s logo from Blue to Pink? Fuck no.

So why even acquire census information you will never utilize? Certainly, if advertisers are pressuring you, you may have need for this census information. However, if this is the case you’d be better served by stating that plainly to the user. “Yes, this service is funded by ads, that is why your interface will literally be covered in ads, hence we need some demographic information so our advertisers could mis-target you better.” That is all it takes, a small note on the form with an indicator as to which fields will be collated for ad data. Keep in mind the suggestions in the first section about how to phrase those questions but if you must ask them to satisfy your advertisers, do so, and be frank about it, but do not underestimate your user base.

Do not guess

Want to offend someone very quickly? Take wild stabs in the dark at their gender, sexuality, and interests. Come up with an algorithm that uses someone’s speech and topics to attempt to determine their gender and sexual orientation. Make that information available to your advertisers and claim that you have high accuracy when using this algorithm. Sound like a bad idea? Well that is exactly what some services do to get around asking basic census data while catering to advertisers. This is often more offensive than asking questions regarding basic data because it often falls on archaic norms and cultural knowledge than it does on hard data. Especially when built for US-based services, pretending that Silicon Valley’s own cultural knowledge universally applies to anywhere outside of the borders of California, this is a recipe for bad data. California über alles.

Rather than try to be secretive or derive user data through language/topic analysis and divination, it’s far easier to just be upfront with your users. Trying to suss out information via language is Twitter’s answer to “collecting data.” It often misgenders and misidentifies users with almost clock-like precision. From my own experience, there was a month where my gender changed week to week as did the ads. One week it was Tampax and the next it was Glenfiddich. I’m not a genderfluid individual but given the ads that Twitter is feeding me, they have absolutely no idea who I am. Each time it changed, I snapped a screenshot of it and poked fun at Twitter’s poor algorithm. However, the problem is often that topics that aren’t decidedly “female” (e.g., threats to masculinity, feminine hygiene, pink things), are often just labeled as masculine, leading to skewed gender readings. This leaves no middle ground for topics that are gender neutral. All topics must be smooshed into a false binary, and this crap information is fed to advertisers who pay out the ass for this “service.”

This approach, while making the sign up process easier, insults both users and advertisers by providing absolutely unreliable data and feeding that unreliability to the user through mis-targeted ads. It’s often offensive and dehumanizing because it boils people down to data points and tries to rebuild this data into an image of an individual without any concrete vision of the individual. Lastly, it offers the illusion of simplicity while completely and disturbingly overcomplicating the process. It’s an absolutely ridiculous system, especially if this is a pay service that’s being offered. It’s also the absolute nadir of western tech culture, attempting to remove any human interaction and allowing arbitrary data the sole responsibility of determining what a human being is.

Why use a language algorithm when you could simply ask the user what data they’re willing to share to advertisers? Have that information be filled in by the user using optional dialogue on sign up! Is obtaining good census and demographic data really that difficult, given how we have centuries practice in collecting and collating census information? According to the 2011 Canadian Census, 20,535 people in live in the city of Hamilton. 12.8% of the population reported Italian to be their primary language. I could tell you minute data about the city I grew up in, while Twitter cannot tell you your own gender, let alone those of your followers. So tell me, what is the better way to derive information?

Conclusion

Data is what you make of it. Forms and surveys are often one of the few ways you interact with your user base, and the only way you can ascertain reliable demographic information. It may also be the first impression and the start of the trust relationship between the user and the service. Thus a bad sign up form could be ruinous and leave bitterness in the user’s mouth. When you’re going to ask for data remember to in more open-ended ways and do some actual work to collate the data into usable categories. Make sure none of these questions are sensitive or offensive. Make sure they’re not culturally sensitive or ethnocentric. Don’t collect data you will not use. Lastly, don’t underestimate your users or your advertisers. When you can meet those criteria, ask away, I see no problem with these questions provided there is a reason and they are asked tactfully.

If you liked this article, consider reading the related article by Julie Pagano!