I'm working on a project compiling various versions of the Bible into a dataset. For the most part versions separate verses discreetly. In some versions, however, verses are combined. Instead of verse 16, the marker will say 16-18. I wonder if, given I have a lot of other versions that separate them discretely, I can train an NLP model (I have about 30 versions that could act as a training set which would constitute to separate those combined verses into discrete verses. I'm fairly new at deep learning, having done a few toy projects. I wonder how to think about this problem? What kind of problem is it? I think it might be similar to auto-punctuation problems and it seems the options there are seq2seq and classifier. This makes more sense to me as a classification problem, but maybe my inexperience is what drives me that direction. Can people suggest ways to think about this problem and resources I might use?
In answer to questions in the comment, I am dealing only with text, not images. An example might be like this:
Genesis 2, New Revised Standard Version:
5 when no plant of the field was yet in the earth and no herb of the field had yet sprung up—for the Lord God had not caused it to rain upon the earth, and there was no one to till the ground; 6 but a stream would rise from the earth, and water the whole face of the ground— 7 then the Lord God formed man from the dust of the ground, and breathed into his nostrils the breath of life; and the man became a living being.
Genesis 2, The message version:
5-7 At the time God made Earth and Heaven, before any grasses or shrubs had sprouted from the ground—God hadn’t yet sent rain on Earth, nor was there anyone around to work the ground (the whole Earth was watered by underground springs)—God formed Man out of dirt from the ground and blew into his nostrils the breath of life. The Man came alive—a living soul!
The goal then would be to divide the message version into discrete verses in the way that the NRSV is. Certainly, a part of the guide would be that a verse always ends in some kind of punctuation, though while necessary it is not sufficient to assign a distinct verse.