Studi sul Cristianesimo Primitivo

Investigating the Authenticity of Pliny the Younger’s Letter to Trajan Concerning the Christians

« Older   Newer »
  Share  
Saulnier
view post Posted on 15/2/2016, 21:23     +1   -1




The peer reviewed journal "Digital Scholarship in the Humanities" (Oxford University Press) have just published my article entitled:

An Application of a Profile-Based Method for Authorship Verification: Investigating the Authenticity of Pliny the Younger’s Letter to Trajan Concerning the Christians

http://dsh.oxfordjournals.org/content/earl...2/12/llc.fqw001

The abstract below:

Pliny the Younger's letter to Trajan regarding the Christians is a crucial subject for the studies on early Christianity. A serious quarrel among scholars concerning its genuineness arose between the end of the 19th century and the beginning of the 20th; per contra, Plinian authorship has not been seriously questioned in the last few decades. After analysing various kinds of internal and external evidence in favour of and against the authenticity of the letter, a modern stylometric method is applied in order to examine whether internal linguistic evidence allows one to definitely settle the debate.The findings of this analysis tend to contradict received opinion among modern scholars, affirming the authenticity of Pliny’s letter, and suggest instead the presence of large amounts of interpolation inside the text of the letter, since its stylistic behaviour appears highly different from that of the rest of Book X.

In the article there are references to the Proceedings of the First Conference on ‘Studi sul Cristianesimo Primitivo, 2007-2014’, Venice, Italy, September 2014, organised by the administrators of this forum.
 
Top
view post Posted on 17/2/2016, 13:44     +1   -1
Avatar

Celebrità

Group:
Administrator
Posts:
3,155
Reputation:
0
Location:
Roma

Status:


Sounds interesting...! (well, although I have always considered it to be authentic, I'd be willing to change my mind)
Waiting the paper from you in PM :)
 
Top
maquanteneso
view post Posted on 17/2/2016, 16:43     +1   -1




CITAZIONE (Saulnier @ 15/2/2016, 21:23) 
An Application of a Profile-Based Method for Authorship Verification: Investigating the Authenticity of Pliny the Younger’s Letter to Trajan Concerning the Christians

I'm sorry in advance for my english; however, sometimes it's funny to talk in english between italians.
I saw the letter and I noticed that it's very short. This is usually a big problem for authorship attribution programs.
What techniques did you use?
 
Top
Saulnier
view post Posted on 17/2/2016, 17:53     +1   -1




CITAZIONE (maquanteneso @ 17/2/2016, 16:43) 
I saw the letter and I noticed that it's very short. This is usually a big problem for authorship attribution programs.
What techniques did you use?

The method applied is the one proposed by Potha and Stamatatos (2014): A Profile-Based Method for Authorship Verification (Proceedings of the 8th Conference on Artificial Intelligence: Methods and Applications, Ioannina, Greece, May 2014)
www.icsd.aegean.gr/lecturers/stamatatos/papers/SETN2014.pdf

Regarding the length of the letter and authorship verification’s problem, great progress have been made within the last few years obtaining very good results for texts even much shorter than Pliny’s letter:

•Brocardo, M. L., Traore, I., Saad, S., and Woungang, I. (2013). Authorship Verification for Short Messages using Stylometry. Proceedings of the IEEE International Conference on Computer, Information and Telecommunication Systems, Piraeus-Athens, Greece, May 2013.
•Brocardo, M. L., Traore, I., and Woungang, I. (2014). Authorship verification of e-mail and tweet messages applied for continuous authentication. Journal of Computer and System Sciences, 81: 1429–40.
•Chen, X., Hao, P., Chandramouli, R., and Subbalakshmi, K. P. (2011). Authorship Similarity Detection from E-mail Messages. Proceedings of the 7th International Conference on Machine Learning and Data Mining in Pattern Recognition, New York, NY, August– September 2011.
 
Top
Simone Emili
view post Posted on 18/2/2016, 00:16     +1   -1




CITAZIONE (maquanteneso @ 17/2/2016, 16:43) 
sometimes it's funny to talk in english between italians.

Holy words!! :-)
 
Top
Domics
view post Posted on 25/2/2016, 09:48     +1   -1




Is it possible to know the main interpolations? Is it a letter that did not refer at all to the Christians or some passages on Christians have been added? Thanks
 
Top
Domics
view post Posted on 26/2/2016, 13:07     +1   -1




Dr, Tuccinardi when in your paper on "La tradizione testuale del libro X delle epistole di Plinio: una proposta alternativa" you write that Book X "si distingue nettamente anche per lo stile" what do you mean exactly? What is according to you the reason of such a different style? Thanks
 
Top
Saulnier
view post Posted on 26/2/2016, 21:41     +1   -1




CITAZIONE (Domics @ 26/2/2016, 13:07) 
Is it possible to know the main interpolations? Is it a letter that did not refer at all to the Christians or some passages on Christians have been added? Thanks

Dr, Tuccinardi when in your paper on "La tradizione testuale del libro X delle epistole di Plinio: una proposta alternativa" you write that Book X "si distingue nettamente anche per lo stile" what do you mean exactly? What is according to you the reason of such a different style? Thanks

The method for authorship verification that I’ve used was proposed by Potha and Stamatos (2014).

www.icsd.aegean.gr/lecturers/stamatatos/papers/SETN2014.pdf

From the findings of my analysis Ep.96.10 should be excluded from Book 10, but I’m not an advocate of this extreme solution. Because I’m fully aware that large insertions in the letter might justify the anomaly (the abnormal length of the letter if compared with all other letters of Book 10 pushes in the same direction).
Unfortunately this method doesn’t permit to identify the interpolations.

I will try to explain in plain language, albeit simplifying a bit, how this method works.
First of all, it is useful to read this synthesis by Neil Godfrey from Vridar:

http://vridar.org/2016/02/17/fresh-doubts-...the-christians/

From the text of Book 10 the Plinian profile (i.e. Lk the text of known authorship) is created. Lk is the list of the most frequently found n-grams of Book 10, sorted in descending order of frequency. In fact the most frequent character n-grams can give information concerning the stylistic peculiarity of an author. Lk=500 means that only the first 500 most frequent character n-grams are considered in the analysis.
Then PT (The Plinian Testimonium, Ep.96) was isolated from Book 10 and the text of the remainder of Pliny’s letters was divided up into fifteen sections about the same length as the PT. The 15 subsections of Pliny's letters, extracted from what we know to be reliable pieces of Plinian authorship, have been used to see what we must expect from reliable Plinian fragments having the same size of the disputed document, then comparing these results with the ones obtained from PT.
So, the profile of each Plinian subsection (PT and P1-15) is created. These are called Lu, i.e. the text of unknown authorship. Lu is the list of all the n-grams found in the corresponding Plinian subsection. Now it’s rather intuitive that the intersection (i.e. the common n-grams – CNG) between Lk (from Book 10) and each Lu of the Plinian subsections can give information about the authorship of the Plinian subsection. Of course this intersection (i.e. the number of the common n-grams between the two profiles) will be higher if the Plinian subsection has the same author as Book 10.
The parameter “measuring” this intersection is called SPI. The values of SPI for the 15 Plinian subsections and for each considered model are homogeneously distributed in a normal distribution. This is not at all surprising because the stylistic homogeneity of Book 10 have since long been recognized even without stylometric tools.
An exhaustive analysis has been carried out by Gamberini (Stylistic Theory and Practice in the Younger Pliny, pp. 332–376) demonstrating that, in comparison with Books I–IX, Book X is characterized by a lack of figures of speech and that its letters point up a complex hypotactic structure instead of the parataxis and brevity typical of the first nine books. Conformance in genre, in register (all letters written to the Emperor Trajan), and in time of writing—all these highly contribute to the uniformity of Book X.
The problem is PT. In all the considered models it has always the lowest value of SPI and in 4 models out of 6 it is clearly an outlier.
 
Top
Porterble
view post Posted on 4/3/2016, 11:43     +1   -1




Congratulations on the paper, it is good to see computational linguistics getting a workout in the historical disciplines.

I was wondering whether you have considered repeating the analysis using LSA as the document comparison method? One such option may be to use the methodology of Satyam et al 2014 upon the n-gram analysis, or to simply apply LSA to the text itself and use the whole of the Plinian corpus as the comparator.
 
Top
view post Posted on 4/3/2016, 12:54     +1   -1
Avatar

Celebrità

Group:
Administrator
Posts:
2,341
Reputation:
+49

Status:


Article's review by Prof. Larry Hurtado: https://larryhurtado.wordpress.com/2016/03...out-christians/
 
Web  Top
view post Posted on 4/3/2016, 16:09     +1   -1
Avatar

Celebrità

Group:
Administrator
Posts:
3,155
Reputation:
0
Location:
Roma

Status:


I would be curious to see what happens if you do the same analysis on any random cut-paste of another Plinian work of the very same length of Ep. 96.10. Did you try? As already pointed out, the shortness of that text seems to me to be the critical issue to deal with.
 
Top
Saulnier
view post Posted on 4/3/2016, 21:07     +1   -1




CITAZIONE (Porterble @ 4/3/2016, 11:43) 
I was wondering whether you have considered repeating the analysis using LSA as the document comparison method? One such option may be to use the methodology of Satyam et al 2014 upon the n-gram analysis, or to simply apply LSA to the text itself and use the whole of the Plinian corpus as the comparator.

Thank you for this suggestion, I have not considered using Latent Semantic Analysis on n-grams. It sounds interesting. Can you provide some references?

CITAZIONE
I would be curious to see what happens if you do the same analysis on any random cut-paste of another Plinian work of the very same length of Ep. 96.10. Did you try? As already pointed out, the shortness of that text seems to me to be the critical issue to deal with.

Book 10 was divided in 15 subsections having the same lenght of PT (i.e. approximately 3000 characters) and they show a relatively homogeneous stylistic behavior in all the considered models.
Models vary modifying the size of the n-grams and the length of the profile of known authorship. Varying the parameters (i.e. the models) would help to identify that model more able to catch stylistic differences between different authors. For fragments having the same author these differences should be less relevant, in fact so it is for the 15 Plinian subsections. Instead from my analysis it’s evident that Ep.96 changes significantly its behavior varying the models (as fragments of Cicero or Seneca). In 4 models out of 6 Ep.10.96 is outside the 99% confidence zone. We are not speaking of minor differences. Differences in topic can hardly explain such a difference. In Book 10 we have one global topical pattern (questions about the administrative affairs in Ponto and Bitinia) common to all the letters of Book 10 (including Ep.96.10) and as many local topics as the numbers of the letters. So should we expect a different stylistical behavior only in Ep.96?
 
Top
Porterble
view post Posted on 4/3/2016, 22:22     +1   -1




QUOTE (Saulnier @ 4/3/2016, 21:07) 
QUOTE (Porterble @ 4/3/2016, 11:43) 
I was wondering whether you have considered repeating the analysis using LSA as the document comparison method? One such option may be to use the methodology of Satyam et al 2014 upon the n-gram analysis, or to simply apply LSA to the text itself and use the whole of the Plinian corpus as the comparator.

Thank you for this suggestion, I have not considered using Latent Semantic Analysis on n-grams. It sounds interesting. Can you provide some references?

A method such as Satyam et al would work, using LSA upon the n-gram dataset:

The longer path would be to create a reference corpus of all of Pliny's works, and then conduct an n-way LSA comparison between the documents to build another similarity comparison. The LSA engine on the ColoradoU site could theoretically be used for this but the data might need massaging into another format as last time i was heavily using the LSA engine it didn't support unicode properly. However, it may have been updated more recently, as I haven't worked properly in computational linguistics since 2010 or so (most of my work was involved with the analysis of decision making and rhetorical intent).

Also, i was wondering whether anyone had done stylometric work of this kind considering the presence of amanuensis'?

Hm, the forum keeps stripping out the URLs. The Satyam paper is entitled:
A Statistical Analysis Approach to Author Identification Using Latent Semantic Analysis: Notebook for PAN at CLEF 2014
by Satyam, Anand, Arnav Kumar Dawn, and Sujan Kumar Saha

While the LSA engine is at lsa (dot) colorado (dot) edu.
 
Top
view post Posted on 7/3/2016, 20:01     +1   -1
Avatar

Celebrità

Group:
Administrator
Posts:
3,155
Reputation:
0
Location:
Roma

Status:


CITAZIONE (Saulnier @ 4/3/2016, 21:07) 
Book 10 was divided in 15 subsections having the same lenght of PT (i.e. approximately 3000 characters) and they show a relatively homogeneous stylistic behavior in all the considered models.
Models vary modifying the size of the n-grams and the length of the profile of known authorship. Varying the parameters (i.e. the models) would help to identify that model more able to catch stylistic differences between different authors. For fragments having the same author these differences should be less relevant, in fact so it is for the 15 Plinian subsections. Instead from my analysis it’s evident that Ep.96 changes significantly its behavior varying the models (as fragments of Cicero or Seneca). In 4 models out of 6 Ep.10.96 is outside the 99% confidence zone. We are not speaking of minor differences. Differences in topic can hardly explain such a difference. In Book 10 we have one global topical pattern (questions about the administrative affairs in Ponto and Bitinia) common to all the letters of Book 10 (including Ep.96.10) and as many local topics as the numbers of the letters. So should we expect a different stylistical behavior only in Ep.96?

Ok. Assuming that PT is not homogeneous to Book X...Did these chunks of Book X show the same behavior of other 3k char. cutpaste of other Plinian works from the Letters corpus? In other words, maybe the problem is not PT but Book X?
 
Top
Saulnier
view post Posted on 8/3/2016, 21:00     +1   -1




CITAZIONE (Teodoro Studita @ 7/3/2016, 20:01) 
Ok. Assuming that PT is not homogeneous to Book X...Did these chunks of Book X show the same behavior of other 3k char. cutpaste of other Plinian works from the Letters corpus? In other words, maybe the problem is not PT but Book X?

The method proposed by Potha and Stamatatos is a profile based method. By constructing a composite author-profile for Pliny, combining the book 10 letters, I wind up with a construct that includes all the book 10 letters (Ep.96.10 included), considering only those characteristics (the most frequent n-grams) useful to catch the Pliny’s profile in book 10. Are you asking if I have tried to compare other Plinian 3000 char (not from book 10) with the Pliny’s profile in book 10? Or if I have tried to compare the 15 Plinian chunks of book 10 with the Pliny’s profile in books 1-9? From the stylistic analysis of Pliny’s correspondence by Gamberini, it’s clear that book 10 is markedly different from books 1-9 of Pliny’s letters and in fact Pliny’s fragments from Book 10 have high value of SPI (as one should expect), only Ep.96.10 shows values low or very low.
Of course with a profile based method the comparisons between each of the samples (of 3000 char) with each of the others, has little or no sense. In fact we cannot apply a profile based method without having a profile (due to its short length, it is not possible to extract a profile from a single Plinian fragment).
 
Top
16 replies since 15/2/2016, 21:23   691 views
  Share