Details
Nothing to say, yet
Details
Nothing to say, yet
Comment
Nothing to say, yet
Data labelers are like gold refiners, enhancing the value of data sets by adding labels. They were once seen as less important but are now recognized as co-creators of data sets and models. After the Orange Run protest in 2024, labelers gained more recognition and better treatment. They coordinated through the RejectU subreddit, exposing poor guidelines and challenging their subjugation in terms of salary and benefits. This led to increased engagement with labelers and improved job prospects. Listen to the interview with Crystal Kaufman for more details. In a world where data is like gold, the data labeler can be compared to the gold refiner, the jeweler, or even the designer. The data labelers are the people who confer their knowledge and understanding of the world onto a data set by adding labels, enhancing the data set's value in ways that allow our most advanced algorithms to learn, teaching them what to consider and how to operate. Just like gold, the data set in its pure, raw form isn't nearly as valuable without the added contribution of the experts. Once relegated to the sidelines, data labelers have come to be seen as co-creators of data sets and models. That's because it was widely understood that labelers were in the business of codifying their experiences, values, and beliefs onto our AI models, since the models themselves were built to mimic the labelers' perception of the world. Despite the prestige, data labelers continued to wear a copper necklace, copper being the same material as the cents they were paid back when annotation work was grossly undervalued. The necklaces they wore were also a reminder of the solidarity that they had to other annotators worldwide. This was life after the Orange Run, a bold act of defiance performed back in 2024. During the Orange Run, annotators organized themselves using a subreddit called RejectU. The subreddit was open to all annotators, but was built specifically by and for those that had had their annotations mass rejected on Mechanical Turk, and there were many. As the story goes, mass rejections meant annotation work that annotators had spent hours performing were not compensated because the people who hired them had decided, for whatever reason, that the annotations didn't deserve to be. In RejectU, the annotators assembled, coordinating how they would annotate across all of their data sets between September and November 2024. During this time, the annotators labeled the data sets to the letter of their instructions, exposing how poor the guidelines were and how much intuition the annotators were expected to use by those that commissioned the work. Annotators also began leaving comments on the data sets they received, exposing all of the oversights and misunderstandings that the people paying for the data labeling had had about their own data. Annotators made comments about over and under representation of examples in the data sets, too few options to choose from when labeling, and challenges with the user interface that were causing confusion and inconsistency. In addition to the mass rejections, labelers were protesting the subjugation of their roles in terms of salary, benefits, and their exclusion from the research and development process. Ever since the Orange Run, those commissioning data annotation work recognized the labelers as playing an indispensable role in the quest to refine their gold. This recognition translated into substantially more meaningful engagement with data labelers throughout the AI development lifecycle, enhancing not only their pay, which would be commensurate to that of other experts on the project, but the transition of data labelers into other roles within the AI pipeline, enhancing job prospects but also the quality of our data sets and algorithms. Does this story sound surprising? Listen to our interview with Crystal Kaufman to hear more about how this might just be the world we're building.