Episode 618: Academic Publishing on an AI Hamster Wheel - podcast episode cover

Episode 618: Academic Publishing on an AI Hamster Wheel

Nov 08, 20241 hr
--:--
--:--
Listen in podcast apps:

Episode description

Guest: Lai Ma, assistant professor in the School of Information and Communication Studies at University College Dublin, discussing how AI will affect the scholarly publishing ecosystem, as described in her recent article: https://kula.uvic.ca/index.php/kula/article/view/287

First broadcast November 8 2024.

Transcript at: https://hdl.handle.net/1853/76513; Playlist  here

"I meant to say I was feeling nauseated."

Transcript

- I run and I run and I run and I run and I run and I run and I run. And I get out, and I've gone nowhere, nowhere! [ROCK MUSIC]

CHARLIE BENNETT

You are listening to WREK Atlanta, and this is Lost in the Stacks, the research library rock and roll radio show. I'm Charlie Bennett in the studio with Cody Turner, Fred Rascoe, and-- oh my goodness-- Alex McGee.

ALEX MCGEE

In the flesh.

CHARLIE BENNETT

Hello.

FRED RASCOE

Holy moly.

CHARLIE BENNETT

Welcome back.

ALEX MCGEE

Thank you.

CHARLIE BENNETT

Are you feeling good?

ALEX MCGEE

Feeling good, feeling good.

CHARLIE BENNETT

OK. And nothing's changed about your life at all?

ALEX MCGEE

Nothing.

CHARLIE BENNETT

You just took a break?

ALEX MCGEE

Yeah, I just-- CHARLIE BENNETT: And now you're back. --was, yeah--

FRED RASCOE

You were just gone?

ALEX MCGEE

--hanging out at home, not doing anything, sleeping a ton, yeah.

FRED RASCOE

I hope you enjoyed your hiatus.

[LAUGHTER]

FRED RASCOE

It was a vacation, right, a vacation?

ALEX MCGEE

Yeah, it was a vacation, yes.

CHARLIE BENNETT

We could hassle Alex all show about why she was gone, but we're not going to. Let me get back on track. Each week on Lost in the Stacks, we pick a theme and then use it to create a mix of music and library talk. Whichever you are here for, we hope you dig it.

ALEX MCGEE

Our show today is called "Academic Publishing on an AI Hamster Wheel."

FRED RASCOE

I run and I run and I run. So there are a lot of problems with scholarly publishing, the openness of it, the misinformation that sometimes happens--

CHARLIE BENNETT

Sometimes?

FRED RASCOE

--the bias that pervades it. CHARLIE BENNETT: And then you throw AI in the mix, and everything-- I'm supposed to say, could just get worse. Everything is worse.

ALEX MCGEE

Today we'll talk to Dr. Lai Ma, an information science faculty member who thinks about how artificial intelligence platforms can exacerbate problems in an already troubled scholarly system of creating and sharing academic research.

FRED RASCOE

And our songs today are about the simulation of humanity, the scary unknown, and hallucinations. The sudden and haphazard application of artificial intelligence can make us feel like we're moving at a dizzying speed without actually going anywhere, like a technological hamster wheel. CHARLIE BENNETT: Fred, I'm starting to feel a little nauseous. I run and I run and I run and I run. AI hallucinates, and the sudden, ubiquitous applications of AI are giving me hallucinations along with it.

CHARLIE BENNETT: Did I say nauseous? I meant nauseated. So let's start with a track that encapsulates that feeling of being trapped in a cycle that is spinning out of your control. This is "The Windmills of Your Mind" by Noel Harrison right here on Lost in the Stacks.

CHARLIE BENNETT

It's probably both, actually.

[NOEL HARRISON, "THE WINDMILLS OF YOUR MIND"]

CHARLIE BENNETT

FRED RASCOE

"The Windmills of Your Mind" by Noel Harrison. This is Lost in the Stacks, and our guest today is Dr. Lai Ma, who is going to introduce herself shortly. But when recording this interview, before I even got to my first question about AI's impacts on academic publishing, she wanted to ask about AI tools used in podcasts. So let's just jump right in.

LAI MA

Have you heard about the new tool, actually--

FRED RASCOE

Yes, the Google tool--

LAI MA

--NotebookLM?

FRED RASCOE

--that simulates-- yeah, it simulates a podcast? Yeah, I actually follow a librarian blogger, Aaron Tay, because he writes about things of interest to academic librarians, and he ran a blog post or two of his through that Google podcast simulation tool. And I listened to it, and it's uncanny. The one mistake that I heard was that one of the hosts said "JS tor" instead of JSTOR.

LAI MA

JSTOR, yeah.

CHARLIE BENNETT

Right. But other than that, the language was so natural. The interplay between the two virtual hosts was so conversational. It did not sound robotic at all. Have you had experience listening to those?

LAI MA

So I tried it with the corpus of articles. I uploaded, I think, like, nine or 10 articles, and I said, turn it into a podcast. It took it five minutes. So it was-- yeah, it only took-- I think kind of like the scary part is it took it five minutes to digest all of this. And then I said, this is crazy. Maybe the subject of scholarly communication is too simple, so I actually fed a few papers by one of my colleagues and friends here.

So he's in medieval history, so something that I don't know about at all. So I just kind of fed it, and it turned it into the podcast, again around five minutes' time. And then I send it over to my friend, and I said, can you listen to this and then tell me-- because it's a subject that's kind of like-- it's medieval history, Irish and all of that, so maybe AI is not so good at it. But the thing is, he said, this is really scary.

It's faithful to the article, and it's also more interesting than the articles.

[LAUGHTER]

LAI MA

So it's actually translating into a language that's so easy to understand. It's interesting.

FRED RASCOE

Well, for now, we're going to do this podcast the human-to-human way, so. I'm glad that you're here and glad that you're talking to us. And I'd like to start by just asking you to introduce yourself and where you work and your research interests.

LAI MA

Sure. So my name is Lai Ma. I am an assistant professor at School of Information and Communication Studies at University College Dublin in Ireland. So my research is really what people called meta-research or research on research, but I mainly really focus on scholarly communication, in particular academic publishing, but also look a little bit about research metrics and societal impact, a lot of issues surrounding this particular area.

So currently, I'm doing a research project on open research, particularly in the humanities and social sciences.

FRED RASCOE

And on your-- I looked a little bit at your website online, and it talked a little bit about how you like to study the intersect of epistemic diversity with the research process and research tools. Can you talk a little bit about epistemic diversity and bibliodiversity? Because I want to talk with you about your work and your perspectives on what that is and how AI is going to affect that and our understanding of that.

LAI MA

So diversity and epistemic diversity-- I think different researchers just use these terms interchangeably. But mainly it is really about the diverse voices from different research areas from different regions because at the moment I think the majority of academic publications, they are actually from what we sometimes called the Global North. Or even, actually, we can actually think about mostly, actually, from the Western regions, so North America and Western Europe.

So bibliodiversity is really actually thinking about we really need a system that encourages more publications or a system that would allow for these marginal voices from other regions can be-- that we can actually hear them.

FRED RASCOE

I think a generally acknowledged problem in scholarly communication that you mentioned is that a lot of what is promoted as the scholarly corpus generally focuses on, as you mentioned, Western Europe, North America. And so your fear is that if artificial intelligence is the platform by which we're interrogating all of this scholarship, that diversity of thought, the problems of diversity of thought and understanding in scholarship is multiplied, magnified.

LAI MA

Well, I think maybe just take a step back here as well to maybe go back to the idea of bibliodiversity and why don't we have it right now.

And that particular question, maybe we need to actually look at how the dominance of indexing services, for example, which a lot of us rely on, so Scopus, Web of Science, these indexing services that we actually would see as holding the gold standard of publications-- however, when we are actually really thinking about the coverage of these indices, then we realize that-- so a lot of non-English-language journals are not being indexed.

At the same time, a lot of regions and countries, they are encouraging their researchers also to publish in what they call international journals, even though they are actually really mainly Western. So in that particular sense, they tend to-- really moving researchers from actually maybe publishing in their national or in journals that will speak their language to actually publishing in international journals. That really affect what we call bibliodiversity and epistemic diversity.

So when AI comes into play, I particularly kind of would be looking at tools that are coming up. So Scopus AI, for example, what they are advertising is really talking about, they can actually use the corpus indexed in Scopus to really generate literature review.

They would be able to-- similar to maybe ChatGPT-- I cannot be sure right now because I haven't used it-- to ask Scopus AI to say, what would be the research gap in a particular area based on what you know from the corpus that you have indexed and you have actually generate this beautiful map of concepts, for example?

So in this particular case, what it means really is that these tools are generating literature review and generating research questions based on largely Western-centric publications.

CHARLIE BENNETT

We'll be back with more from Professor Lai Ma about the challenges of AI in scholarly publishing after a music set.

ALEX MCGEE

File this set under SF459.H3H33.

[THE STEVES, "MECHANICAL FRIEND"]

ALEX MCGEE

features My human That was "Human Features" by Black and before that "Mechanical Friend" by the Steves, songs about the simulation of humanity.

CHARLIE BENNETT

This is Lost in the Stacks, and we're back with Dr. Lai Ma of the University College of Dublin. We ended the last segment by discussing the challenges of bias in the collection of materials used to train AI. So Fred started this next segment asking if there was any chance that AI could also be used to mitigate that bias. Hamster wheel!

FRED RASCOE

Maybe it sounds a little bit counterintuitive, but you'd think that having a tool, an AI tool, that could process everything in such a short amount of time could not only search those North American, Western tools but also just bring in everything else and just really condense that and help the researcher-- help the researcher out. Is there a potential that, barring environmental problems of artificial intelligence, could AI have potential to solve these bibliodiversity problems?

Or is that just going too far?

LAI MA

I'd say-- no, I am not an AI expert, but at the same time, I think there are two issues maybe at hand. One is about the corpus that we are looking at. So because now I think some of the AI tools basically saying that we are not crawling the web, so we are not using ChatGPT or other tools that's just widely available because the idea is that we are using these corpus that's already passed the quality control.

So at the moment, I think the question will be kind of like, how do we evaluate or make sure that the corpus, if we are actually reaching out of these indices that we are very familiar with, that we are actually including publications from France or Algeria or in Chile, that they actually passed the quality control, that we actually want them into the corpus and then to produce the work that will be valuable and useful for researchers to use. I think that would be one challenge there.

And then I think another challenge really would be, actually, AI tools have not been trained on that many languages, so talking about podcasts, I listened to another podcast a while ago talking about in Africa there are over 1,000 languages. And they actually tested AI tools on these languages, and they were not able to understand them.

So in the sense that even though AI will be powerful, but currently it is really mainly trained on English-language materials, and I don't know how good or how bad they are when we are actually thinking about languages, particularly outside of Western Europe and North America.

FRED RASCOE

And also, like you were saying before, researchers are encouraged to publish in ways or in languages or in venues that are valued by the North American and Western Europe publications. To get noticed internationally, you have to publish here, X, Y, or Z, and that's not necessarily in your own native language or in your own native venue. It's somewhere-- so you're already changing how you communicate that scholarship to get it into the Western databases.

LAI MA

Yeah, absolutely. I think, at the same time, I have also read one story from Kenya. This researcher tried to actually publish her work in Scopus Index Journals, and then the research was totally rejected. And then in her own words-- I'm just paraphrasing here now-- it's not because the research is not good. It's because what they regard the topics, the crops that she was actually doing research on, as weeds.

So the very idea is also kind of like, we are actually missing a lot of topics that could be very important in, I suppose, resolving a lot of crises in our globe and not just looking at societies in a sense because we tend to look at our communities. They are important. But actually from those regions they will be working on something that we might not actually recognize that they are important.

So again, you even-- so going back to the question of AI is that even the name of those crops came up maybe even in English in AI tool that would have been trained on English languages could be totally-- well, maybe non-existent but in the sense that they would be kind of just skipping over it because they have not actually seen this enough to think that that's important.

FRED RASCOE

Is it worth having the thought experiment of assuming that the corpuses the AIs are trained on are able to grab every language, every publishing venue, Western, Eastern, Northern, Southern-- and let's just do the thought experiment that it can have that entire corpus. Is an AI literature search for an academic then better using those AI tools, or is it just increasing the noise and the futility of it?

LAI MA

That's a very interesting question. If AI can understand every language as well as it understands English, so presumably they will also make better judgments in the near future to say, well, this piece of research is good, and that piece is not, then maybe. I don't know. But that sounds a little bit scary.

ALEX MCGEE

You're listening to Lost in the Stacks, and we'll talk about more scary things about scholarly publishing in an AI world on the left side of the hour.

[UPBEAT MUSIC]

ALEX MCGEE

KATRINA VANDEVEN: I'm Katrina Vandeven, and I'm coadministrator of the Women's March on Washington Archives Project and coordinator of the Oral History Collection.

DANIELLE RUSSELL

And I'm Danielle Russell, coadministrator and coordinator of the physical materials and photographic collection. KATRINA VANDEVEN: You are listening to Lost in the Stacks, the original research rock and roll radio show-- DANIELLE RUSSELL: --on WREK Atlanta. (SINGING) Sharp teeth In a broken jaw

FRED RASCOE

This is Lost in the Stacks. We're talking a lot today about how scholarly publishing is changing now and will continue to change in the near future, prompted by AI. CHARLIE BENNETT: Yet again, Fred. While most of our interview today focuses on the problems of AI intersecting with publishing academic research, there are, of course, other problems with AI. The environmental cost of using it is a big one.

In a sidebar to our interview, I asked Lai Ma about whether she uses AI personally and, if not, why. Here is a clip of her response.

LAI MA

At the moment? No, I think someone will have to first define AI tools. If I use Google, does it count? But in terms of ChatGPT and others, no, I don't use it. And my first reason is environmental, and I think because of that particular reason we should just stop using it.

FRED RASCOE

I heard listening to the radio on the way in this morning that a AI search, for instance, in ChatGPT uses 10 times the power of just a regular Google search. And I'm sure a quick Google search probably uses 10 times the power of actually--

LAI MA

--picking up the phone.

FRED RASCOE

Right, exactly, calling someone.

[LAUGHTER]

FRED RASCOE

So we keep raising that bar.

LAI MA

We keep raising the bar.

FRED RASCOE

The bar of environmental destruction. It's not great. Not great. Hamster wheel is not great. File this set under HD45.P86.

CHARLIE BENNETT

I'm still in the same place.

[DWIGHT TWILLEY BAND, "LOOKING FOR THE MAGIC"]

CHARLIE BENNETT

Ah, la, la la, la That was "Scary" by Bjork, and I'm led to believe that Bjork has never been played as a solo artist on Lost in the Stacks before.

FRED RASCOE

I don't think so. I looked through the list. We've played Sugarcubes. CHARLIE BENNETT: Yeah, but no Bjork. No Bjork solo.

CHARLIE BENNETT

That's just sort of amazing.

ALEX MCGEE

--kind of embarrassing.

FRED RASCOE

Until today.

ALEX MCGEE

Kind of embarrassing.

FRED RASCOE

Yeah, sorry, Alex.

CHARLIE BENNETT

Alex is back, back in effect. Before Bjork, we had "Duckling Fantasy" by Stove, and we started that set with "Looking for the Magic" by the Dwight Twilley Band or Twilley Band--

FRED RASCOE

Twilley.

CHARLIE BENNETT

--Twilley, songs about unknown forces seeming magical and scary.

FRED RASCOE

This is Lost in the Stacks. We're talking about AI and scholarly publishing with Dr. Lai Ma of the University College in Dublin, Ireland. Current AI tools, when prompted to write academically, often spit out gibberish or maybe what's charitably called "hallucinations."

CHARLIE BENNETT

I have another name for it, Fred.

FRED RASCOE

OK. Is it radio-friendly?

CHARLIE BENNETT

No.

FRED RASCOE

One of the things that Dr. Ma has written about is comparing this kind of gibberish output with a famous hoax of a scholarly paper from 1996 written and published in a respected peer-reviewed journal by Richard Sokal. It was an academic scandal when Sokal revealed it was a hoax. I asked doctor Ma to talk about the relationship of this hoax paper to AI output.

LAI MA

At that point of time, I think it created a lot of contentions, and even now I would be a little bit afraid in terms of talking to people if I actually mention this particular hoax article because--

FRED RASCOE

You're afraid now, even though it was back--

LAI MA

I would think so.

FRED RASCOE

--decades ago?

LAI MA

It could elicit strong feelings because some people would think that the article was trying to humiliate scholars in the humanities, particularly in the postmodern and poststructuralist approaches, et cetera. But when I kind of encountered this particular, I suppose, episode in history-- and because of AI, I just thought that it was very interesting because this is almost as if I try to feed the great works of a lot of people into the AI and then say, generate an article out of all of these.

So I think at that particular point, obviously, I could only imagine how long it took for the author, for Sokal, to actually put that article together.

FRED RASCOE

Right. I said it was gibberish, but I guess it's not-- because the sentences flow together. It's coherent. But it's not really any real academic thought, but it reads like it could be.

LAI MA

Yeah, and it--

FRED RASCOE

So it's not actually gibberish.

LAI MA

--went through peer review.

FRED RASCOE

Right, yeah, it went through peer review.

LAI MA

It was published.

FRED RASCOE

And so to that point, if we go back to our thought experiment where we imagine an AI that has access to the entire corpus, including that Sokal article, AI has the power not just to generate a literature review but also an entire article.

And so your fear then or one's fear then is that millions of these-- I think in your article you refer to it as the hamster wheel of academic publishing, just thousands upon millions and then just hundreds of millions of similar gibberish articles that sound coherent on the face that could go through peer review. And then we're in a scholarly publishing problem.

LAI MA

That's right because when that article went through peer review-- actually, Sokal, he revealed himself that it was a hoax article, and then it generated a lot of discourse around it and then what's the ethics about it and so on and so forth.

So now my fear, in fact, is that a lot of these articles could be generated, and I think one question would be, if we sometimes cannot even actually make good judgments on our colleagues' manuscripts in the sense that-- and then we are actually seeing so many retractions going on and then the reports of misconduct and so on and so forth, we are already kind of in a crisis in some ways.

Now with these tools, they can be actually generating even more this type of articles and then faster than-- human cannot process. We are not fast enough to process all this, but can you trust AI to process all of this and say, this one is fake and this one is real? I think I am just in a state of, I don't know anymore.

FRED RASCOE

And I don't think that we as a species, or at least the academic species, have enough self-discipline to just totally put the brakes on this. Academic work is hard. Research is hard. And then you have to write about it because it's a job requirement, and a lot of people that have to write about it, a lot of academics-- just because they do the research doesn't mean that they like writing about it or that they're good at it. That's a huge chunk of academia.

And so the idea that you could just push a button and have this program generate it for you because the thing that you're judged on is whether you created something that was published and went through peer review and is there for the community-- so the motivation is there.

LAI MA

Yeah. It's very complicated, as you said, because now the reward system and because academic job market is so competitive. I would say some other markets are also very competitive, but the competitiveness is really pushing people to say, well, what you're looking at would be publications, and then next will be citations. And then obviously, publications go first, and then people will be really trying to publish as many as possible in the fastest way possible.

So these tools really will be saying that-- I still use my dictionary. I use it a lot in terms of like, how do I use this word? As a non-native English speaker, I use it a lot because that really improves how I would kind of-- how I polish my articles, and I try my best there. But now I haven't actually done it myself. I heard that some people just really just put it into ChatGPT. Polish the article for me, and that's done.

And I think it raises a lot of questions in terms of-- I suppose it's the reward system, whether you will reward someone who's really trying to learn and improve themselves. Also, I think there's another question about-- it's a social question as well so in the sense that when you're thinking about in the past, a lot of times when we don't know something, we will try to maybe, oh, email a colleague, ask for some suggestions and comments.

In that particular process, also when we are reading, so when we're actually thinking about AI generating literature review to that particular extent, that means that you're not actually reading the source articles yourselves anymore. And what does that even mean in terms of one's expertise and why people would trust you as an expert when you're actually just relying on AI?

And then a lot of times, I think in my own experiences where I would be kind of like the reading process itself actually prompts questions and sometimes out-of-the-box questions that I don't know how smart AI would be in terms of say, well, actually, there's a little thing that's there that's very interesting, and we haven't actually thought about that.

So I think there are multifaceted ways to really think about this whole development of AI, particularly in academic publishing, but at the same time, I would just say, yes, you're right. The motivation, the reward system is really encouraging people to use it, so what can I do?

FRED RASCOE

Surrender to our AI overlords.

LAI MA

Is that a command?

FRED RASCOE

Right.

[LAUGHTER]

FRED RASCOE

Well, Dr. Lai Ma, I've been so happy to talk to you as a human to a human in real time, with no artificial intelligence intermediary.

LAI MA

Absolutely. That was lovely. Thank you so much for having me.

ALEX MCGEE

This is Lost in the Stacks, and you've been listening to our interview with guest Lai Ma, assistant professor in the School of Information and Communication Studies at University College of Dublin.

CHARLIE BENNETT

File this set under BF1052.B8.

[FRANK DETAILS, "FALSE PRETENSES"]

CHARLIE BENNETT

My God, the spiders are everywhere "Further Reflections in the Room of Percussion" by Kaleidoscope and before that, "False Pretenses" by Frank Details. This is the most fun-to-read set list in a long time. Those were songs about bad hoaxes and horrifying hallucinations. Our show today was called "Scholarly Publishing on an AI Hamster Wheel."

With all the discourse around how AI is disrupting everything around us, including scholarly publishing, it can be tempting to think that library services or organizations-- excuse me, that library services of organization and preservation might wither away. Today's interview reminded us here at Lost in the Stacks of our interview from November 2016 with artist Julia Scher and a statement she made on the relationships between humans and machines and libraries.

JULIA SCHER

I love the library. I believe that after the next difficulties we have on Earth, like war, famine, art and books will exist. They will still be here. The accumulation of information, the materiality of thoughts into something that can be shared is an ambition we need to hold on to and not just give it over to machines.

CHARLIE BENNETT

Well, we can call that a manifesto, too, I think. All right, roll the credits. Lost in the Stacks is a collaboration between WREK Atlanta and the Georgia Tech Library, written and produced by Alex McGee, Charlie Bennett, Fred Rascoe, and Marlee Givens.

ALEX MCGEE

Legal counsel and a plastic hamster ball that can go anywhere it wants were provided by the Burrus Intellectual Property Law Group in Atlanta, Georgia.

CHARLIE BENNETT

Burrus is going to make the hamsters free.

FRED RASCOE

Special thanks to Lai Ma for being on the show, to Georgia Tech's own Cassidy Sugimoto for advising Lai when she was in grad school, to everyone promoting academic bibliodiversity. And thanks, as always, to each and every one of you for listening.

CHARLIE BENNETT

Our web page is library.gatech.e du/lostinthestacks, where you'll find our most recent episode, a link to our podcast feed, and a web form if you want to get in touch with us. And please don't use ChatGPT to write that.

FRED RASCOE

You would never know.

CHARLIE BENNETT

I would never know. That's why I don't want them to do it.

ALEX MCGEE

Next week, there's a new Artist-in-Residence at the Georgia Tech Library, so we're going to find out what our library looks like through her eyes.

FRED RASCOE

It's time for our last song today, and whatever serious issues may be happening in the wider world, whatever those might be, let's use this last track to welcome the weekend in hopes that all of us can, just for a minute or two, let loose a little bit and stomp out our frustrations on the dance floor. Produced by the great Quincy Jones, who passed away earlier this week, this is "Stomp" by the Brothers Johnson here on Lost in the Stacks. Have a great weekend, everybody.

[BROTHERS JOHNSON, "STOMP"]

FRED RASCOE

Transcript source: Provided by creator in RSS feed: download file