Details
Nothing to say, yet
Details
Nothing to say, yet
Comment
Nothing to say, yet
The discussion is about the utility of detection tools in detecting the use of generative AI and their limitations. The AI detection tools analyze text for patterns that suggest they were generated by AI. They also analyze metadata to see if the text was pasted from other sources. These tools differ from similarity detection tools that look for exact matches in a database. The AI detection tools provide a probabilistic account rather than concrete evidence of AI use. They can have false positives and false negatives. The decision to integrate these tools depends on the institution's goals and the cost-benefit analysis. The discussion suggests that it may be more productive to incorporate AI into assessment rather than solely focusing on detecting its use. The recording on Turnitin's AI detection tool can be accessed on the website. Hi everyone, you're very welcome to the second of the Academic Integrity Unit's Bitesize sessions. Today our topic is, it was kernel mustard in the library with the Gen AI tool. And the purpose of this discussion today is to give you a brief overview on the utility of detection tools in detecting generative AI use and their merits and limitations. So I'm joined by my colleague, Fionn McGraw, Dr. Fionn McGraw, who's an educational developer with the Academic Integrity Unit. Fionn, you're very welcome. Thanks for joining us. Thanks a million. So the first question I have for you today is, can you give me a little bit more information about AI detection tools and how they work? Okay, before I get into it, there is on our website, the UL Academic Integrity Unit website, there is a turn it in video short presentation that myself, Dr. Angelica Rizquez and Chris Hart from ITD did, where we go into more detail if anyone is curious. But on a surface level, the way the AI detection tools worked is they look for various instances within the text that would suggest that they had been generated by an AI large language model. So it could be textual pattern. It could almost be, there's a sense in which when you talk about the capacity that these systems have for hallucination, it can look for consistency. So if it sees an unusual inconsistent deviation away from what it would expect, there is also kind of, there's some metadata analysis, which isn't directly looking to see whether it's been generated by a large language model, but as to whether it's potentially been pasted from various other different systems. So there's a whole variety of different techniques that it's employing collectively in an attempt to establish a probabilistic account of whether or not a large language model was used in generating that particular text. So that sounds very different to the similarity detection tools that are embedded in Turnitin, where my understanding is they basically look for all the literature out there and just detect passages of text that have similarities. Is that correct? This works in a different way to that. Exactly. Yeah. So the similarity score relies on an extraordinarily large database of texts that Turnitin have access to. And that increases when you submit stuff, depending on the submissions that your institution, a lot of institutions will actually facilitate. Certainly PhD theses and things have been incorporated with into that database. So that's an expanding database that they have. And what the similarity tool does then is, as it says on the tin, it looks to see if there is textual matches between a student submission and anything that actually is found within that database. So while it's not a guarantee that someone has plagiarized, it's much more difficult to account for how you utilized very similar, if not identical language to something that's already been published in an area that's similar to what you were being assessed in. That's very much a cut and dried sort of system that's worked very well for a number of years in terms of detecting plagiarism. The difference with the AI tool is you're only ever going to get a probabilistic account because it's tied to text that's been generated novelly. So there is no database to compare it to. The large language models, people will talk about the fact that there is a kind of similarity in terms of the output of the large language models. And as the large language models get more and more sophisticated, I think that becomes less and less true. But even if that was the case, similarity in itself isn't something that can be checked against. So you're only going on the probability of that similarity indicating that the large language model was used. So that's really the prime difference. When we're talking about the similarity score, it is checked against a database of information and it looks for exact matches. In the AI detection tool, it is just looking for a probabilistic account. And from our perspective, that is a huge wedge from a disciplinary perspective, from the perspective of accusing a student potentially of engaging in plagiarism. They're a world apart, unfortunately. Reliability seems like it deviates massively there. So you've touched on quite a bit of that already, but maybe if you could talk me through some of the pros and cons of using these tools as part of assessment. I think we're aware that some institutions, particularly in Australia, have these tools integrated into their Trinitin system. But why might you integrate them? Why might you choose not to? In actual fact, I think the AI detection tools are quite good at what they do. The question is, is what they do what educational institutions want? And I think in a lot of cases, there's a gap there. So there is always going to be the worry by virtue of the fact that it's probabilistic. There will be potentially false positives. And that's a major worry. But on the other side, there's also a lot of false negatives as well, depending on how high the threshold is set. So it's a useful system that Trinitin have developed. It's just a case of whether it does what the university wants it to do. And I think what the university wanted it to do was it wanted to do something similar to the traditional Trinitin and the And it's just not going to get there. I think that's an impossibility. And the national guidelines that we have from the National Academic Integrity Network are suggesting that this isn't going to work. A huge amount of research is suggesting that this isn't going to work in the sense that it will not do what Trinitin similarity score does in terms of plagiarism checks. There is a use. There's a potential use for these systems down the line, if we can kind of scaffolding, if we can scaffold kind of policies and processes around them. And that's an open question. But if we were to use it as a potential catalyst for a conversation, that might be useful. But the problem with that is how those conversations are constructed, how accusatory it is. And then there's a very concrete, the concrete issue of cost is huge as well. They're not cheap systems. And there's a question of whether you're getting value for money when you're getting something that's only indicative and only probabilistic rather than something that's cut and dried like the traditional alternative. And the strength of the evidence it provides, as you say, for potential misconduct procedures and the distress that could cause a student in the case of a false accusation is huge. So absolutely tread with caution there. So I suppose my final question to you then is, in light of what we've discussed, will we ever have this tool integrated in UL? Or is it too early to say? Any thoughts on that? We had it and we switched it off. So there's absolutely a possibility that we could switch it back on again. It would probably be a discussion that involved numerous different stakeholders. And one of the principal issues, again, would be cost, whether we decide that it's actually worth it, when you consider what you get for the money paid out. So as I said, there is a possibility that we have new academic integrity policies and procedures in place. And we see a role for us. And we see value for money stretching from that. But as I said, it's a case of whether people are content to have simply a probabilistic account, whether people are content to have a conversation starter rather than a checker or something that, as you said, will provide evidence. It's never going to get to a point where it will provide evidence. It may potentially be something that down the line is useful in terms of being a minor part of a much larger process. So I'd remain open to the possibility. But I am inclined to think that it will probably remain off, certainly next year anyway. And I think the other thing around that is that the work of our unit is providing some resources and frameworks for staff to think about proactively integrating generative AI into assessment. And rather than trying to close the door and keep it out, how do we work with it now? And so as innovations start happening around assessment processes, that it might become more implicit. And our concern around it, around detecting it, may not be as great because we have a framework there to help students use it in their work. Do you think that's a fair assessment? Exactly. Yeah. So again, it comes back to it will do what it does and it does that relatively well. It's just a case of whether that's what we want. And as you said, we're gradually realizing that we can't actually check for plagiarism within examinations or within assessments as we traditionally did. So the question is, should we invest in these types of AI detection tools? Are they valuable? Do they work? When in actual fact, a more productive direction to go would be in terms of transitioning away from, not transitioning away, but looking reflectively at the traditional modes of assessment. And as you said, looking to incorporate the use of these large language models in generative AI into assessment, be it formative assessment or otherwise. And then as you said, then the issue of these models' use or detecting these models' use starts to fade into the background a bit. So I think as you said, it's still important from the security and integrity of our actual assessments. But I think an awful lot of that work is going to be done, as you said, through re-evaluation and just minor reflections on how we assess and then attempt to incorporate the actual AI systems into assessment rather than, as you say, concentrate on detecting their use. Thanks very much for that. So as John mentioned at the outset, there is a very helpful recording on our website around Turnitin, around the AI detection tool, and that was recorded in February of this year. And you can access that by going to the assessment tab on our website and having a look there for that. So thank you for tuning into this edition of Give Me About 10 Minutes On, and we'll have another recording available in about two weeks' time. Thank you.