"I think the impact depends on whether it is a standard or non-standard task
. Of course, if you give a simple task, like to label the images with the classes of physical objects and provide some confusing instructions, people will still be able to perform the task accurately. But what if we give them an unusual task?" said Zack
In contrast to the previous case, Zach
mentioned the example of an unusual crowdsourcing task where the instructions clearly play a crucial role. The experiment is called counterfactual augmented data
, where the performers were given the task of editing a text document in order to provide a result with counterfactual labels that differ from the original one. Only the required changes to 'flip' the applicable label were allowed. The authors iterated the instructions many times, trying to make them clearer, and the different set of instructions gave completely different sets of results.
"What do we mean by bias?
A statistical meaning would be a sort of systematic error that can be overcome by collecting more samples. There's also a societal meaning: thinking of bias in terms of what benefit or harm is done. All such results will be biased towards each other in a statistical sense; thus, the normative question appears: which of these results is the right one? There is no way to get to this question without knowledge of how this data was collected and what was the decision that was being used to drive it, what the impact is on people. This chain of reasoning expresses the societal concerns associated with data collection."