0

I'm currently researching and building a chatbot to link to an existing intelligent tutoring system for the domain of the topic of relational databases. The current intelligent tutoring system is rudimentary (forms with checkboxes and the sort, a quiz type) and anything could serve the purpose of modeling the dialogue that would be better than just reading a textbook or having a passive attitude in the classroom. The main problem is that I don't have any corpora to utilize, besides the hardcoded questions, and I don't have any time at all to create a better corpus from scratch.

My thesis adviser gave me the starting point for a dialogue model: the chatbot would ask a question and the student would try to answer. Depending on the feedback, the student could ask "how", "why", or "explain".

There is research showing that using theoretical frameworks to guide the learning goals of a chatbot could help students learn more effectively by moving from passive to active learning. It is really necessary to understand learning objectives before starting to develop. A theoretical framework can guide the dialogue modeling.

This happens, for instance, with CodingTutor which is an agent to help students learn how to program. It uses the revised Bloom taxonomy to combine different types of knowledge to help the person learn. This is still complex because there are many types of questions, answers, and other types of interaction like solving a programming exercise. The CodingTutor approach could be an option if I didn't lack time.

On the other hand, other ITS with a chatbot module was implemented by using the user's own questions to a previous iteration of the ITS chatbot and Google queries. The learning goal was only memory retention. The students asked "Why", "Advantages/Disadvantages", "Application of", "How" and "Who". This is a more reasonable approach than CodingTutor but with a caveat: I have no corpus.

There are other ITSs, like AutoTutor, that use expectation-and-misconception tutoring (EMT). Still, the complexity is greater and I would have to hardcode every single misconception to a single expectation.

My question is if there are some corpora that I could use for the simple learning objective of memory retention using "why", "how", and "explain" for the topic of relational databases.


EDIT: given that it's thesis research, according to my adviser, the topic of relational databases of the intelligent tutor isn't fundamental, but it has to be useful for instance K-12 Geography. The chatbot is a complementary module to an intelligent tutor, and the intelligent tutor is still bare bones that any corpus - with some learning objectives in mind - would suffice. Also, the end result of that said intelligent tutor far into the future would be to be cross-domain (not open-domain), for instance, to be able to be adapted for different domains in a standalone fashion (different training sets, different corpora, the works).

However, given that there isn't any corpus for that intelligent tutor, ANY corpus would be better. So, a corpus for relational databases Q&A created with ChatGPT would be enough (with all the inherited problems doing so, like bias, incorrect answers, etc.). Now the "how to" is more important than the actual accuracy. In other words, the corpus had to be accurate enough.

  • Try to use GPT-4, it has the knowledge you want to teach, and the prompting would be fairly easy. – Volker Siegel Mar 28 '23 at 17:47
  • @VolkerSiegel actually to make a corpus for "why" "how" and "explain" it would be easy to use ChatGPT. I would gather several questions about relational databases and create a corpus like that. According to my adviser, it would be better than it actually is, because there is no corpus. – kokumajutsu Mar 29 '23 at 13:19
  • I was thinking of using GPT-4 as the chatbot directly. I am quite sure that it already has the knowledge, you do not need data to teach it, only prompts to ask it. But I am sure GPT-4 can just create a corpus. How large should it be? ChatGPT (GPT-3.5) can do that too, but has a shorter context length (4000 vs 8000) The GPT-4 with 32,000 is not yet available. But you can work around that limit by splitting the task. – Volker Siegel Mar 29 '23 at 18:34

0 Answers0