Your data is invalid: Collecting data on sex, gender, and sexuality

· November 22, 2020

I’ve written previously about how I’ve acquired knowledge through my lived experiences with gender that I believe makes me a better information professional. One example of this is my experiences participating in research in which I am marginalised and erased because of my gender history. This makes me aware of the ways in which the resulting datasets fail to capture the experiences of trans and gender diverse (TGD) research participants, which means that the TGD population are not represented by and do not benefit from the outcomes of that research. Because of these experiences, I am inclined to centre the interests of research participants, communities, and society in my data curation practice.

The 2016 white paper Making the Count: Addressing data integrity gaps in Australian standards for collecting sex and gender information from the National LGBTI Health Alliance identifies a number of data integrity gaps in how sex and gender information is collected: order effects, priming, and ecological validity.

Order Effects

How researchers ask about sex and gender and in what order those questions are asked can have an effect on the kind of responses they’re likely to get. For example, I consider myself to be a man. I don’t identify as transgender, but do acknowledge that I have a transgender history and lived experience. If I’m responding to a survey that asks for my gender, I would ideally prefer to identify myself as a man. But if I feel that it’s relevant for the study that the researchers know about my gender history then, if I’m not certain whether I will have any other opportunity to communicate this, I identify myself as a transgender man or write my gender in as ‘Man (assigned female at birth)’. Given that these answers don’t actually represent my true gender identity, a better way of asking is to use a multi-step question. The first step is to ask whether a person has an intersex variation. The next step is to ask whether their gender identity is different from their assigned sex at birth. The third step is to ask about their current gender identity. For me, being asked whether my current gender identity is different from my assigned sex at birth before I’m asked about my current gender identity means that I’ve already communicated the fact that I’m a person of transgender experience to the researchers when it comes to the gender identity question. So when I come to answer about my current gender identity I feel confident responding that I’m a man with no qualifiers needed, which is the way I actually identify.


Priming is a psychological phenomenon in which a person’s exposure to a memory or association triggering stimulus can unconsciously affect their subsequent behaviour and responses. The National LGBTI Health Alliance white paper discusses how priming can affect data integrity in relation to collecting sex and gender information. For example, people are less likely to respond to questions that use misgendering and stigmatising language about them. In particular, people are less likely to respond to questions that other them, such as questions that ask people to select from the gender categories ‘Man’, ‘Woman’, ‘Other’, or ‘Male’, ‘Female’, ‘Other’. This is exactly how the data on gender is collected using the ALIA Workforce Diversity Calculator. The calculator is a tool that is associated with the ALIA Workforce Diversity Trend Report 2019, which reports exclusively on binary man/woman genders, perhaps because of how the data was gathered.

Ecological Validity

One of the mistakes researchers can make is to fail to consider how real people with intersecting identities are supposed to answer their questions. This can result in survey questions that are difficult or impossible for some of us to respond to. For example, I once participated in a survey about coming out, which asked a series of questions about my coming out process in relation to my LGBTIQA+ (lesbian, gay, bisexual, transgender, intersex, queer, asexual, plus other marginalised sex and gender identity labels) identity. For me, as for many people of trans experience, I had a number of ‘comings out’: I came out about being attracted to women when I was a young teenager, I came out about being attracted to men when I was in my mid-teens, then I came out about being a man when I was in my twenties. My experiences of coming out about my sexuality were very different from my experiences coming out about my gender. The survey, which assumed I had a singular LGBTIQA+ identity and coming out narrative, required me to answer as if I felt the same about both. How should I respond to the questions? Choose to answer about just one of my coming out experiences and stick with that for the duration of the survey? If so, which one, gender or sexuality? No advice or guidance was given. Or should I try to answer for both simultaneously? Give the answer that fits both where they happen to match, and take the average between the two when they don’t? I went for the latter, in the end, but it felt like the answers I gave were all but meaningless.

And it’s not just trans people who are likely to have multiple coming out experiences. Even within sexuality, people may come out multiple times. For example, someone may come out as gay or bisexual, and then later realise that they are also on the asexual spectrum and come out as asexual. Consequently, the design of this survey would only have gathered meaningful data for cisgendered LGB folk who have had stable sexual orientations throughout their lifetimes. That’s a lot of assumptions built into the survey design and it ignores a big chunk of the LGBTIQA+ community (trans people, asexual people, people with fluid sexualities). I didn’t have much confidence that it would have gathered a useful dataset.

Good Practice

One survey that actually does a good job on all these fronts is the Australian Workplace Equality Index survey. This is actually four surveys in one: a general survey for all staff, a survey for staff with marginalised sexualities, a survey for trans and gender diverse staff, and a survey for staff with intersex variations. Participants get asked about their experiences and identity at the beginning, using questions that are ordered appropriately considering order effects, and are then shown the relevant sections of the survey based on their answers.

Data Curation

What does this mean for me and my work? Essentially that, in my opinion, involving people of diverse genders and sexualities in research is vital for designing better research instruments and improving data quality. Shoshana Rosenberg and P.J. Matt Tilley recently published an article on the ‘insider/outsider research staircase’, from consultation, to participatory research, to trans led research. They were discussing this in terms of the members of the academic research team, but I think the same applies to the processes of data curation and data management.