Literary Style Home Up Isaiah - Background Literary Style Isa 2 - That Day Isa 3 - Payback Isa 4–The Branch Isa 5:The Vineyard Isa 7 - When? Is 9:A Child is Born Is 11:Rod from Jesse Is 11:Creation Restored Isa 14:Morning Star Isa 18:Present Age Isa 22:Valley of Vision Is 23:Tyre Destroyed Is 26:Lord Jehovah Is 27: Leviathan Isaiah 28 Is29:Knowledge Gap Is 32:Reality Check Is 35:Blessing Predicted Is 36:Assyrian Attack Is38:Prayer Dangerous? Deutero-Isaiah Is 42:The Servant Is 49:The Messiah Is 50:Messiah's MO Is 53:Suffering Servant Lord God of Hosts Is 61:Jerusalem Restored

 

Introduction

The Second Isaiah

For much of the twentieth century it was considered an accepted fact amongst Biblical scholars that the prophecy of Isaiah was written in two or more pieces[1]. This was a process of thought that commenced in 1780 with questions regarding Isaiah 50 and which probably reached critical mass when the celebrated and conservative Franz Delitzsch conceded in 1880 that Isaiah 40 and onwards was probably written at the end of the exile[2]. Thus the notion of ‘Deutero-Isaiah’ was born.

The disintegration process however went much further. By the early nineteen hundreds it was already common to slice off chapters 56 to 66 into a third piece[3] believed to have been written around 450BC. Yet the knives were now unshieved and Isaiah became a fertile ground, for anyone with an opinion, to slice and dice as they felt fit. In 1910 Professor George Robinson chronicled various ‘radicals’ that had left only 262 of the 1292 verses of Isaiah to the original prophet.

The modern position has now shifted to a more redaction based model. This asserts that Isaiah is really a patchwork for many different written and oral records all compiled and woven into one fabric. Depending upon how you look at it this position either asserts the unity of the book or claims that the book is such a complex collection of disparate pieces that one has to treat it as a unity[4].

Of course running entirely counter to modern critical thought many believing Christians have been content to take the Word of God at face value and simply trusted the that book of Isaiah was written by Isaiah. To a large extent I personally view this as the most profitable path; whilst there may be much in the discussion that is academically amusing there is very little that would provide spiritual growth.

The Value of Truth

Nonetheless, as I have dealt with elsewhere at length[5], I do not think the contention over Isaiah is unexpected or insignificant. One of the core arguments against the unity of Isaiah is that it contains certain predictive prophecies that would be completely astounding if they were genuine. The most famous of these is the naming of Cyrus[6] in a book which claims to have been written at least a hundred years before he was born. This would be intriguing enough but the really incredible fact is that the predictions of Cyrus occur in the middle of eight different places in Isaiah[7] where God explicitly states that one of the key things that differentiates Himself from other ‘gods’ is that he knows the future and tells it to people.

Therefore we see that accepting the unity of Isaiah on faith ‘despite all the evidence’ may leave one in a theologically correct position; however it has robbed the believer of one of the more visible proofs God has given of His character and abilities. Therefore to leave the ‘evidence’ out there unchallenged and unquestioned is to concede an important piece of ground to the critics. It may even prove to be a piece of ground that stumbles some of our young[8].

The Scope of This Essay

Notwithstanding the above it is not my intent to launch a deep investigation into, or attack upon the surgeons of Isaiah. Instead I wish to delve deeper into the exact nature of one of the subject matters that often occurs in these debates: The Literary Style of Isaiah. More specifically I wish to look at what I shall call the linguistic artifacts that are present in the book. I will discuss this more mathematically under ‘Linguistic Coincidence’ but for this introduction a more layman’s view of the concept may be beneficial.

The vocabulary used by a piece of literature will be driven by three primary factors:

a)                  The subject matter. This is the most obvious; an article about motorcycle mechanics will refer to various engineering terms, and article about theology will refer to God and Christ.

b)                  The culture of the author and intended audience. As an Englishman living in America this is quite noticeable to me. There are words that are common between our languages with identical dictionary definitions in both countries and yet the term may be in common parlance in one country and an esoteric archaism in the other. Word usage often changes over time; words come into fashion and drop from it. It is also quite possible that an author will adopt a style simply for the effect is has upon the target audience.

c)                  Individual mannerisms and peculiarities that each individual possesses. As any good impersonator will tell you people have physical quirks and mannerisms; some subtle and some less subtle. What is true of the body is also true of the mind. Many of our favorite preachers have phrases or expressions that make us smile because they ‘remind us of them’. These phrases and the frequency of usage are also indicative of the individual.

The argument therefore runs that if Isaiah was written by an individual about a single subject it should have a fairly consistent vocabulary throughout and which may differ from the vocabulary of other books. In fact given the critics assert that their original reason for splitting Isaiah was that it addresses different subjects; and they further assert that the books were written for different peoples at different times the only legitimate reason for any similarity between the books whatsoever is that they were written by the same individual.

It should perhaps also be noted that I italicized legitimate as the other possibility is a forgery or impersonation. This would be a strange claim to make as one of the features of Deutero Isaiah is supposed to be his advanced enlightenment compared to the previous author. Nonetheless for completeness it is a possibility we will consider in the following.

The Facts

Many conservatives have attempted to seize upon this logic to assert unity of authorship. Robinson asserts that the divine title ‘the Mighty One of Israel’ occurring three times in Isaiah and nowhere else is ‘singular’. He is further impressed that ‘streams of water’ occurs twice in Isaiah yet nowhere else. Many others have also selected individual phrases and repetitions of occurrence and claimed that it showed something.

I would suggest that these fragments of evidence do suggest something but we have no real way of knowing what they suggest until we have a much better understanding of the vocabulary overlap of the Old Testament in general. How common was it for two unrelated books to happen to share a phrase that is otherwise unique?

My intent in this paper and the project it annotates is to construct a set of mathematical metrics to measure the closeness of two Biblical books. In other words: to produce a set of questions which, if objectively answered, give us factual evidence of the literary similarity of two books. Secondly I aim to execute and document these metrics against the Old Testament canon with the hope of producing results which can become an objective factual basis against further conversations of this form.

Methodology

Language

In my opinion one of the first mistakes that many people make when attempting a linguistic analysis of Isaiah, or any other Biblical book for that matter, is that they perform their analysis in an English text. This has to be a mistake for whilst the theological content is accurately rendered in the target language the linguistic artifacts of culture and personal preference will be those of the translator not those of the author in the original language. One only has to compare good sound English translations from different centuries to see the extent to which word choice can deviate wildly even if the semantic content is relatively static.

For this reason I wish to perform this analysis at the level of the Hebrew text. Further I suggest it is useless to simply look at Isaiah because we have no point of reference to define what is reasonable; therefore the whole Hebrew canon will be considered. The New Testament is left out of the analysis because it is written in Greek and the Greek language has different properties from the Hebrew[9].

The Strong’s Number

Of course the downside to studying this in Hebrew is that I don’t speak Hebrew and the programming languages that are out there do not easily lend themselves to processing Hebrew text. The problem moves from difficult to extreme when it is considered that the same word may appear in different lexical representations to denote shift of tense, plurality and gender. Fortunately these problems can be reduced or even removed using one simple device: the Strong’s Number.

The Strong’s number is effectively a numeric encoding of the Hebrew (or Greek) of the original language. It is usually used to allow easy reference from a translated English word to the original language. A number of translations are available with the Strong’s numbers interspersed with the text. For our purposes it does not matter which translation we use as we can simply remove the English text leaving behind a sequence of numbers.

Whilst the sequence of numbers will then be entirely unreadable to a human, to a machine they will be entirely legible. Further we know that numeric equivalence at the Strong’s level implies word equivalence in the original language. Thus we have turned a potentially complex parsing problem into one of simple numeric comparison.

Reductio Ad Absurdum

Perhaps the strangest feature of the methodology, at least for someone as conservative as me, is that the process will start with the presumption that Isaiah is split into two pieces; Isaiah 1-39 and Isaiah 40-46. The reason is that we wish to examine those links that exist between the two halves and see if they are stronger or weaker than the links that exist between other books in the canon. In order to count or measure the links between two entities it is first necessary to have two entities; thus Isaiah has to be split.

For those of you more mathematically or logically inclined, the process I am really using is Reductio Ad Absurdum. By basing the process on an assumption (two Isaiah’s) I aim to produce a set of statistics that may suggest that those two pieces are abnormally close. If I do so then that suggests a flaw in the process, which I would suggest is the opening assumption of their being two Isaiahs!

Whether you agree with this approach or not: in the following I shall refer to Deutero-Isaiah by which I mean Isaiah chapters 40-66. The term Isaiah shall refer to the first 39 chapters. Again for clarity – this is an assumption I am making for analytic purposes – it need not[10] reflect my opinion of reality.

Available Materials[11]

Texts that are Available Electronically

One of my principle motivations to consider this problem was the knowledge that it would be possible to produce a lot of raw investigative material using readily available resources. For this project all that is required in a list of Strong’s numbers by verse. However for the purposes of this exercise I acquired the texts as follows:

Programming Language Selection for Text Processing

Whilst the language processing that this project attempts is theoretically advanced the programming concepts involved are not. Reading some twenty megabytes of text is something that a home PC can do in under a second using almost any language. The process is linear and can be accomplished in batch mode. Further the eventual amount of coding will be trivial by modern standards. Thus most of the advances made in programming language theory and practice in the last twenty years are not really going to be required to tackle this project.

As there was no compelling requirement for any given approach I instead deferred to a couple of pragmatic considerations. Firstly it may be useful to be able to showcase the results of this work; this is most readily done on the Internet. Secondly I already have a substantial amount of Perl code that processes Biblical texts that I use for my own website. The result of these two considerations is that I chose to use Perl for this exercise; almost any language could be used if the project were undertaken seriously.

Legal Considerations

There is no copyright on the KJV outside of the United Kingdom; I do not know of any restrictions placed upon the Strong’s encoding of the KJV. It should be noted that the KJV text itself is only used for ease of reference so obtaining an entirely perfect version is not required for this project.

Formatting for Text Processing

Whatever the source of the textual version a necessary precursor to performing linguistic analysis is to transform the text into a ‘processing friendly’ format to allow downstream processes to occur independent of the format of the input data[12]. This process is usually referred to as ingestion and the code normally has to be written for every file that is going to be used. Ideally the process is run once at the start of the project and the output of the process is then used for the remainder of the project.

The format I have personally standardized upon has one verse per line and requires the lines to be ordered to follow the Biblical sequence. Each verse is preceded by an 11 character descriptor that defines the verse that follows. The format is BB:CCC:VVV where BB is a two character book number, CCC is three characters for the chapter number and the VVV is three characters for the verse number. Thus 01:001:001 corresponds to Genesis chapter 1 verse 1. In this particular case the input text actually lists the books in alphabetic order so the text had to be processed 17 times to emit the books in the sequence I wanted.

For this particular use I changed my normal format in two ways. Firstly I spotted chapter 40 and onwards of Isaiah and moved it into a new book 67. Secondly, as I am only using the Strong’s numbers, I removed all of the English text and Hebrew annotations to leave a simple stream of numbers for each verse.

The Analytic Results

One point worth noting is that all of the observations that follow are the result of one 180 line Perl program written and tested over the course of one week. Computer science moves rapidly; sixteen years ago I spent six months compressing the Bible text down so that it could realistically be loaded onto the PC of the day. Analytic and linguistic research is now programmatically available to just about everyone.

The Size of the Books

The first part of the puzzle we need if we are to form a mathematical model for the relationships between two books is a measure of the size of each book. This is because, all other things being equal, the chances of something odd occurring in a book[13] should be proportional to the length of a book. The traditional way of measuring a book tends to be in terms of verses. This certainly is a good approximation to the length of the book but it is also somewhat arbitrary as the verses to not appear in the original Hebrew. Instead therefore I intend to measure the total number of words each book contains.

There is another measure which may well be interesting too and that is the vocabulary size[14] of each book. One might expect there to be a rough correlation that longer books will have larger vocabularies. The size of vocabulary however can also be a strong indicator of literary style[15]. The aim would therefore be to plot a graph of book length to vocabulary size and see where the various books fall.

The graph above shows all of the Old Testament books plotted by Book Length and Unique Word count. For the shorter books the number of unique words is about a third of the number of words. Then as the books get longer the increase in vocabulary size decreases; Ezekiel, Genesis, Jeremiah and Psalms being the four longest books. However in addition to the trend it is useful to note the outliers. Points moving towards the bottom right of the graph have abnormally few words for the length. Thus Leviticus stands out as having a low vocabulary. Heading towards the top left end we find Isaiah is exceptional for having a very large vocabulary for its (remaining) length. 1 Chronicles and Job come in second and third, then for its length Deutero-Isaiah is a stand-out amongst the mid-length books for vocabulary size.

Two Book Unique Words

One of the crudest measures of distance between two books is the measure of the number of words that only exist within two given books. This will identify common subject matter and it may point to some idiosyncrasy of the author. Of course the chances of two books sharing an otherwise unique word increases with the length of the book. Thus for every pair of books, I counted the number of words that they and they alone use. I also scaled that number based upon the size of the book by taking the number of shares, multiplying it by the square of the average book length and dividing it by the two book lengths. Thus if the average book length was 1000 words and I had 15 co-occurrences between two books of length 1200 and 800 the scaled result would be 15 * 1000 * 1000 / 1200 / 800.

I computed these numbers for every pair of books in the Old Testament: however I will be presenting three tables. Firstly the top ten pair matches across all books. This will allow us to verify that the measure is somehow meaningful. The top six pair matches for Isaiah and then the Top six pair matches for Deutero-Isaiah. These will obviously allow us to see how close the books are to each other but also if they are both close to the same other kinds of books.

Book

Book

Co-occurences

Scaled Co-occurences

Comments

Ezra

Daniel

151

300

This is the most significant tie-up both actually and scaled. I was rather surprised when I first saw the linkage. One doesn’t think of them as similar books. However they were both written by men that had spent substantial time in a Persian court. This number would suggest that the Hebrews of the exile developed a new section of vocabulary not shared with the earlier prophets.

Genesis

1Chronicles

105

26

Of course a sizeable genealogy or two can easily cause high correlation.

Ezra

Nehemiah

75

178

This is the second highest scaled result and third highest result. Ezra and Nehemiah are of course known to be closely related books. An expected and encouraging result.

2Samuel

1Chronicles

37

17

Another confirmatory result. Both books are known to focus upon King David and therefore share some common core vocabulary.

1Chronicles

Nehemiah

21

18

This one also caused me to frown until I checked in my study bible[16]. 1 Chronicles is believed to have been completed around 425BC and Jewish tradition assigns it to Ezra. Whilst not the same book this statistic suggests the same time and setting.

2Kings

Isaiah

19

8

Having just studied Isaiah I was ready for this one. The account in 2Ki of Hezekiah and that in Isaiah are clearly very close; this is reflected in an amount of common core vocabulary.

1Chronicles

2Chronicles

19

6

The correlation between these two books in time and content is well known; it should be no surprise that they are also linked linguistically.

Leviticus

Deuteronomy

18

7

Again a correlation between two books known to have been written at a similar time by the same person.

Job

Psalms

17

5

This one is interesting and may suggest that some of the Psalms came from the region of Job. It could also suggest some common wisdom vocabulary. However the low scaled result should be noted. It could just suggest that Psalms is a big book.

Joshua

1Chronicles

16

8

 This correlation is probably historic. 1 Chronicles briefly recounts the history of the time up to David and Joshua is the only other book covering the history of the invasion of Canaan.

 

In many ways this table contains no news; or at least very little that was not already available. However this is good news. It suggests that the measurement of unique words between two books does tend to correlate with known links between the two books. This adds some validity to the measure. We have also seen linkages due to subject matter[17], culture[18] and probably author[19].

Perhaps the one slightly disappointing thing for the conservatives is that Isaiah and Deutero-Isaiah do not make it into the top 10 linked books. In fact the two books would appear at number 35 on this list. Looking at the books that are linked to Isaiah suggests why:

Book

Book

Co-occurences

Scaled co-occurences

2Kings

Isaiah

19

8

Isaiah

Jeremiah

16

4

Psalms

Isaiah

13

3

Job

Isaiah

12

8

Proverbs

Isaiah

10

7

Isaiah

Deutero-Isaiah

8

6

 

We have already noted the 2Kings passage that corresponds to Isaiah. Then we find Jeremiah who predicted and lived through the fall of Jerusalem. This is obviously the same fall that Isaiah predicted. We then find Isaiah drawing his vocabulary from the, admittedly large, body of wisdom literature most of which had been written in Jerusalem some hundred and fifty years before. Bringing up sixth place, although fourth in terms of significance is then Deutero-Isaiah.

Looking at the table for Deutero-Isaiah is equally instructive:

Book

Book

Co-occurrences

Scaled co-occurences

Psalms

Deutero-Isaiah

13

5

Isaiah

Deutero-Isaiah

8

6

Job

Deutero-Isaiah

7

6

Proverbs

Deutero-Isaiah

6

6

Exodus

Deutero-Isaiah

6

3

Deutero-Isaiah

Jeremiah

6

2

 

Of the top four co-occurrences for Deutero-Isaiah we find three of the same wisdom books that featured for Isaiah. Also in second place we find a link to Isaiah. In sixth place we find a link to Jeremiah (which was second placed for Isaiah). The one new book we find is a throw back to the book of Exodus; Exodus had occupied the eight spot for Isaiah.

What we therefore see is that Isaiah and Deutero-Isaiah share words with each other but also with the wisdom books and Jeremiah. As we are looking for words unique between two books the fact that we appear to have a cluster of books using similar language will actually reduce the chances of any two of them having a unique pairing.

Additionally the very power of looking for unique words in pairs of books is also its greatest weakness. These words are by definition oddities: they occur in low numbers. Therefore there is a danger that the noise of randomness[20] will actually distort some of the truth in the underlying data. Fortunately both of these problems can be somewhat ameliorated by altering our concept of a word.

Two Book Unique Word Pairs

Counting the number of words that only occur within two books does make sense but it assumes that words appear independently within text. This is obviously not true; any word in a given sentence often has an explicit grammatical or semantic link to the word next to it. For example nouns are often preceded by an adjective. Verbs are often preceded by an adverb. The rules for Hebrew and English are different and frankly I do not know them well enough to produce all of the meaningful word pairs from a sentence. Given we are looking for oddities however, we can simply produce a list of all of the word pairs and those that are pure chance have an extremely low chance of being found in another book.

For clarity I will give a small example of what I am doing:

 

The large dog bit the small cat

 

Will produce a sequence of word-pairs thus:

 

The Large, large dog, dog bit, bit the, the small, small cat

 

A grammarian would tell you to drop the pairs using the article (the). However my assumption is that articles are sufficiently common that the act of looking for uniqueness will implicitly drop them out unless they are used in an odd context in which case they are interesting anyway!

There are far more individual word pairs than individual words and thus there are far more instances where a word-pair is only extant in two books. In fact the numbers go from about fourteen hundred instances to just over eleven thousand. This will help to even out any random noise. We shall now proceed to look at the three Top tables again.

Book

Book

Occurences

Scaled Occurences

Comments

1Kings

2Chronicles

611

202

Parallel narrative of same period

2Samuel

1Chronicles

329

152

Parallel narrative of same period

2Kings

Isaiah

317

148

Shares narrative of Hezekiah

2Kings

2Chronicles

288

101

Parallel narrative of similar period

Genesis

1Chronicles

252

62

Sharing some major geneologies

Ezra

Nehemiah

209

496

Accounts written at similar time about similar subject possibly with new vocabulary.

Exodus

Numbers

199

42

Continuing narrative of same period

Leviticus

Numbers

176

55

Parallel accounts of same period

2Kings

Jeremiah

174

40

Cover same period

Exodus

Leviticus

147

47

Parallel accounts of same period

1Samuel

2Samuel

140

59

Subsequent accounts of similar events

 

To me this Top Ten table is a little breathtaking. Leaving aside the mathematics for a moment take a look at those book pairs and ask yourselves how many times you have been searching for a fact or verse and not been sure which of a given pair of books it was in. My guess is that many of those not quite sure moments would involve one of the books pairs above.

The other most noteworthy change from the first table is that the Ezra – Daniel link has now dropped[21]. This suggests that the uniqueness of the Persian derived vocabulary is now diminishing in significance compared to similarity of subject matter. The table above shows that all of the pairs are now narrating identical or immediately subsequent events. This pattern is followed if we look at the table for Isaiah:

Book

Book

Occurences

Scaled

2Kings

Isaiah

317

148

Isaiah

Jeremiah

80

22

Psalms

Isaiah

79

21

Isaiah

Deutero-Isaiah

44

34

Isaiah

Ezekiel

43

13

Job

Isaiah

34

22

 

We find that Job and Proverbs both drop down a couple of places[22]; Psalms retains its place although scaled it drops down to fifth. 2Kings and Jeremiah remain in place and Deutero-Isaiah moves up as does Ezekiel (which also narrates the fall of Jerusalem). It should be noted that in terms of book-size the Isaiah to Deutero-Isaiah link is now second only to 2 Kings and Isaiah.

The table for Deutero-Isaiah follows the pattern although with one interesting surprise:

Book

Book

Occurences

Scaled

Psalms

Deutero-Isaiah

130

50

Deutero-Isaiah

Jeremiah

76

30

Isaiah

Deutero-Isaiah

44

34

Deutero-Isaiah

Ezekiel

37

16

Genesis

Deutero-Isaiah

34

13

Job

Deutero-Isaiah

32

31

 

Firstly we should note that five of the six links are identical to Isaiah showing that they are part of the same language clustering. In terms of significance the Deutero-Isaiah to Isaiah link is second place as in the Isaiah table. The tie to Psalms has actually strengthened whilst the links to Job and Proverbs have weakened. This could denote a move towards more poetic or even florid language.

The new book is Genesis[23]. This could just be noise; the scaled value is low as Genesis is a large book. However it may also be suggestive. An argument for Deutero-Isaiah is that it has a global view of God unseen earlier in Hebrew thought (or so it is claimed). My counter argument is that globalism is the precise view of God portrayed in the Bible up until the call of Abraham. It may well be that the latter parts of Isaiah are not introducing a new concept (and language) but simply moving back to the concepts laid out very early in scripture.

Two Book Unique Three Words Sequences

 

It would be nice if one could run a similar algorithm to detect a correlation between phrases or idioms. However, the question as to when a sequence of words turns into a known phrase is an area of ongoing research within the data sciences. One of the latest concepts is confabulation theory[24] which is largely the work of Hecht-Nielson[25]. This is a mathematical model that uses conditional probability in an attempt to detect a sequence of words that is being used sufficiently often that it forms a phrase. Unfortunately it requires a corpus of billions of words to train the model well enough to make it predictive[26]!

Fortunately for us we are not trying to find phrases everyone knows; rather those known to a relatively small number of people. Therefore I will simply produce lists of all of the three word sequences and see which ones fall into two books. Then as before I will simply assert that the fact that they are used in two places suggests that they form a meaningful unit[27].

There are far fewer hits than for two word pairs; this is not surprising. It would be quite a coincidence for someone to string three words together by chance and get the same as another person. However one might hope that this will not introduce as much noise as the individual word comparisons did. This is because three words in a sequence have to obey rules of grammar and semantics and they will have a well formed meaning; thus they will not occur randomly almost by definition.

This table is sufficiently similar to the one for two word phrases that it is worth noting the movers to get an indication as to what is occurring: