Christian Lawson-Perfect @christianp

0 posts0 participants0 posts today

**Digital Humanities Uni Potsdam** @dh_potsdam@hcommons.social · Apr 2

Digital Humanities Uni Potsdam @dh_potsdam@hcommons.social

Stylometry… That’s exactly what it looks like:
you i the a to and that in of it me is this what no on your yeah don't my do…

and so on! Especially if you’re working with movie dialogues
#DHSpringSchool #stylometry

**Digital Humanities Uni Potsdam** @dh_potsdam@hcommons.social · Apr 2 *

Apr 2 *

Digital Humanities Uni Potsdam @dh_potsdam@hcommons.social

Jan Rybicki: You can count many things in films—like dead bodies That alone can be a good proxy for genre.
If there’s one, it’s a love story that ends badly
If there are two, it’s a love story that ends even worse
Three to six? A crime story
A hundred? That’s Rambo
#DHSpringSchool #Stylometry #DigitalHumanities

**Digital Humanities Uni Potsdam** @dh_potsdam@hcommons.social · Apr 2

Apr 2

Digital Humanities Uni Potsdam @dh_potsdam@hcommons.social

“Much of what I'll show you will be the colourful pictures of my own failures” — Jan Rybicki (Uniwersytet Jagielloński), in his usual manner, delivers a witty & extremely educational lecture on #Stylometry and #DistantReading in Film after an obligatory bit of ironic self-humiliation #DHSpringSchool

**CyberFrog** @froge@social.glitched.systems · Jan 28

Jan 28

CyberFrog @froge@social.glitched.systems

Reminder for those who may not realize this, but #Stylometry is kind of an insane field of study, and you can be uniquely identified based on your writing style alone.

This has, in the past, been applied to open source developers and programming code too, and it was found that using stylometry techniques you can identify the author of a compiled binary based on their open source code style ~78% of the time

https://arxiv.org/pdf/1512.08546v1

There are some techniques to avoid this luckily, which involve fairly basic changes to your writing style and structure that can very effectively anonymize things again:

https://en.wikipedia.org/wiki/Adversarial_stylometry

**Staatsbibliothek zu Berlin** @stabi_berlin@openbiblio.social · Jan 17

Jan 17

Staatsbibliothek zu Berlin @stabi_berlin@openbiblio.social

In unserem #StabiLab gibt es Digital Humanities zum Ausprobieren! Am Dienstag, den 21. Januar, lernt ihr bei uns, wie ihr mit dem Tool #Stylo Literatur erforschen könnt http://sbb.berlin/59m32

#StabiBerlin #Stylometry #Stylometrie

**Blake Eskin** @bdeskin@saturation.social · Dec 27, 2024

Dec 27, 2024

Blake Eskin @bdeskin@saturation.social

“Mosin also told the BBC about an incident where Abaturov’s grandmother came to the university to demand an explanation for why her grandson was given a ‘B’ instead of an ‘A.’” #wikipedia #stylometry

https://meduza.io/en/feature/2024/12/27/anthropologist-alexandra-arkhipova-and-the-bbc-unmask-the-historian-responsible-for-denouncing-opposition-minded-russians-from-behind-a-pseudonym

Meduza · Dec 27, 2024Anthropologist Alexandra Arkhipova and the BBC unmask the historian responsible for ‘denouncing’ opposition-minded Russians from behind a pseudonymBy Meduza

**Christof Schöch** @christof@fedihum.org · Dec 5, 2024

Dec 5, 2024

Christof Schöch @christof@fedihum.org

Later today at #CHR2024, we are going to present our work on #Multilingual #Stylometry!

We isolated the influence of #language on #authorship #attribution #accuracy by translating multiple #corpora into each others' languages while keeping #corpus composition stable.

Interactive showcase: https://showcases.clsinfra.io/stylometry

Full paper: https://ceur-ws.org/Vol-3834/paper9.pdf

This work was developed within the @CLSinfra project in #Trier, #Krakow and #Prague with Artjoms Šeļa, Evgeniia Fileva and Julia Dudar.

Two colorful heatmaps, in hues of blue, yellow and red; one above the other; with various dropdown lists on the left to vary parameters.

Continued thread

**immibis** @immibis@social.immibis.com · Nov 22, 2024 *

Nov 22, 2024 *

immibis @immibis@social.immibis.com

#StackOverflow said no because they aren't infringing my copyright, so I sent them a #GDPR erasure request, noting that due to #AI #stylometry all posts and comments are personally identifying information.

**JCLS** @jcls@fedihum.org · Nov 14, 2024

Nov 14, 2024

JCLS @jcls@fedihum.org

Agapitos and van Cranenburgh use computational #stylometry to show that while 'Octavia' and 'Hercules Oetaeus' were largely written by #Seneca, a closer analysis of the text segments reveals signs of mixed #authorship. https://doi.org/10.48694/jcls.3919 #CLS #CCLS24 #Classics #AuthorshipVerification

Journal of Computational Literary StudiesA Stylometric Analysis of Seneca's disputed plays. Authorship Verification of <em>Octavia</em> and <em>Hercules Oetaeus</em>Seneca's authorship of Octavia and Hercules Oetaeus is disputed. This study employs established computational stylometry methods based on character n-gram frequencies to investigate this case. Based on a Principal Component Analysis (PCA) of stylistic similarities within the Senecan corpus, Octavia and Phoenissae emerge as outliers, while Hercules Oetaeus only stands out when the text is split in half. Subsequently, applying PCA and Bootstrap Consensus Trees (BCT) to a corpus of distractor texts, both disputed plays align with the Senecan cluster/branch. The General Impostors method confidently reports Seneca as the author of the disputed plays under various scenarios. However, upon closer examination of text segments, indications of mixed authorship arise. Based on computational stylometry, it appears that the disputed were in large part, but not wholly, written by Seneca.

**Nanette Rissler-Pipka** @NanetteRissler@fedihum.org · Oct 2, 2024

Oct 2, 2024

Nanette Rissler-Pipka @NanetteRissler@fedihum.org

Look what landed on my doorstep The book is also available #OpenAccess online at #heiUP: https://heiup.uni-heidelberg.de/catalog/book/1157 and I would like to thank the very patient editors who had to deal with switching the publisher and coming up with ways to improve the quality of my illustrations in my article about #stylometry in #French and #Spanish for #Picasso 's writings: @christof @josecalvo @u_henny and Robert Hesselbach, Daniel Schlör

**Till Grallert** @tillgrallert@digitalcourage.social · Sep 27, 2024

Sep 27, 2024

Till Grallert @tillgrallert@digitalcourage.social

If you are interested in computational approaches to #Arabic and #stylometry, you can join us for two hybrid sessions at #DAVO2024 this afternoon with papers by Maxim Romanow (Hamburg), Maroussia Bednarkiewicz (Tübingen), Xenia Kudela (Berlin), Aslisho Qurboniev (London), and myself.

Session 1: https://gesellschaften-im-wandel30.de/frontend/index.php?page_id=37220&v=List&do=15&day=5295&ses=33189#anker_session_33189

Session 2: https://gesellschaften-im-wandel30.de/frontend/index.php?page_id=37220&v=List&do=15&day=5295&ses=33190#anker_session_33190

Zoom link: https://uni-goettingen.zoom-x.de/j/69656412607?pwd=a2bbBLdGKYJdyfl6PGlwNg8fdvPNXQ.1

gesellschaften-im-wandel30.deKonferenzzeitplan – Gesellschaften im Wandel: Recht, Kultur und Politik im Vorderen Orient

#MultilingualDH

**Stefano Zacchiroli** @zacchiro@mastodon.xyz · Sep 6, 2024 *

Sep 6, 2024 *

Stefano Zacchiroli @zacchiro@mastodon.xyz

New #paper out: « Code #stylometry vs formatting and minification » https://peerj.com/articles/cs-2142/ , where we show how much current code stylometry techniques (i.e., how to automatically detect the author of a source code snippet) are resistent to automatic code formatting and minification. (Spoiler: quite a bit, authors can still be identified after those source-to-source transformations.) Available #openaccess on #PeerJ CS.

PeerJ Computer ScienceCode stylometry vs formatting and minificationThe automatic identification of code authors based on their programming styles—known as authorship attribution or code stylometry—has become possible in recent years thanks to improvements in machine learning-based techniques for author recognition. Once feasible at scale, code stylometry can be used for well-intended or malevolent activities, including: identifying the most expert coworker on a piece of code (if authorship information goes missing); fingerprinting open source developers to pitch them unsolicited job offers; de-anonymizing developers of illegal software to pursue them. Depending on their respective goals, stakeholders have an interest in making code stylometry either more or less effective. To inform these decisions we investigate how the accuracy of code stylometry is impacted by two common software development activities: code formatting and code minification. We perform code stylometry on Python code from the Google Code Jam dataset (59 authors) using a code2vec-based author classifier on concrete syntax tree (CST) representations of input source files. We conduct the experiment using both CSTs and ASTs (abstract syntax trees). We compare the respective classification accuracies on: (1) the original dataset, (2) the dataset formatted with Black, and (3) the dataset minified with Python Minifier. Our results show that: (1) CST-based stylometry performs better than AST-based (51.00%→68%), (2) code formatting makes a significant dent (15%) in code stylometry accuracy (68%→53%), with minification subtracting a further 3% (68%→50%). While the accuracy reduction is significant for both code formatting and minification, neither is enough to make developers non-recognizable via code stylometry.

**Christof Schöch** @christof@fedihum.org · Aug 9, 2024 *

Aug 9, 2024 *

Christof Schöch @christof@fedihum.org

Interesting! Dominika Weronska on "A Stylometric Glance at Basque Novels" at #DH2024. #stylometry

The author did stylometric analyses on 57 Basque novels, a first!

**Christof Schöch** @christof@fedihum.org · Aug 8, 2024 *

Aug 8, 2024 *

Christof Schöch @christof@fedihum.org

Now up at #DH2024, Maciej Eder, developer of #stylo and co-organizer of #DH2016 in #Krakow, on various distance measures for #Stylometry: "Manhattan, Euclidean and their Siblings. Exploring Exotic Measures of Text Similarities...".

Key idea: Manhattan distance is L1-norm based, Euclidean is L2. But we can vary this parameter for a wide range of values, from 0.1 to 10. Then evaluate accuracy for authorship attribution.

Result: For longer vectors, it pays off to use a value of less than 1!

Maciej on stage with slides next to him.

Maciej at the lectern with a slide behind him showing performance in a line plot, depending on two parameters: vector length and L value.

**Frank Fischer** @umblaetterer@chaos.social · May 30, 2024 *

May 30, 2024 *

Frank Fischer @umblaetterer@chaos.social

Kurz mal getestet, stylo() kann die verschiedenen Versformen bei Goethe ziemlich sicher auseinanderhalten: Dramen in Alexandrinern, Knitteln, Blankversen, gemischten Versen sowie die beiden hexametrischen Epen.

(Volltexte via #DraCor bzw. @gutenberg_org.)

#DigitalHumanities #Stylometry

Replied in thread

**Dragon-sided D** @dragonsidedd@sciencemastodon.com · May 23, 2024

May 23, 2024

Dragon-sided D @dragonsidedd@sciencemastodon.com

@dvergano … until you start using techniques to defend against #stylometry

https://www.whonix.org/wiki/Stylometry

(One of the many reasons I love and support the #whonix project)

Whonix · Nov 9, 2023StylometryDeanonymization based on a user's linguistic style.

Replied in thread

**JCLS** @jcls@fedihum.org · Apr 22, 2024

Apr 22, 2024

JCLS @jcls@fedihum.org

@jcls Another paper we would like to highlight, again for the lovers of #novels

Dorothy Henriette Modrall Sperling, Mike Kestemont & Vincent Neyt (2023), “The Authorship of Stephen King’s Books Written Under the Pseudonym “Richard #Bachman”: A Stylometric Analysis”, Journal of Computational Literary Studies 2(1), 1–35. doi: https://doi.org/10.48694/jcls.3594

Keywords: #Stephen_King, #stylometry, #pop_culture, #authorship verification, contemporary English-language #fiction

Figure: Boxplot in simple black lines against white background, showing the absolute frequencies of pop-culture references in 100 randomly-selected 10,000-token segments from each Bachman, Harris, King, Koontz, and Straub book. "Bachmann" has the highest reference count of all authors, King is next.

Continued thread

**JCLS** @jcls@fedihum.org · Apr 15, 2024

Apr 15, 2024

JCLS @jcls@fedihum.org

This next paper is about #stylometry in a #translation setting involving novels in #Swedish and #Danish:

Martje Wijers (2023), “Why the Daisy sisters are different. A stylometric study on the oeuvre of Swedish author Henning #Mankell and the Dutch translations of his work”, Journal of Computational Literary Studies 2 (1), 1–27. doi: https://doi.org/10.48694/jcls.3585

Keywords: #stylometry, #cluster analysis, #PCA, #delta, #zeta, #translation

Figure: Consensus network of the translated Dutch corpus: classic Delta distance, 100–1,000 MFWs, modularity 0.6. Colorful network graph with several clusters of works.

**Christof Schöch** @christof@fedihum.org · Nov 11, 2023

Nov 11, 2023

Christof Schöch @christof@fedihum.org

A spontaneous Saturday afternoon provocation: "Dear fellow stylometrists, let’s drop the dendrogram and cherish the distance matrix": https://dragonfly.hypotheses.org/1414

Large clustermap in which a heatmap is organized into clusters, shown here in tones of yellow, orange, red and black, for the distances between pairs of novels in a set of 30 English novels.

#stylometry #dendrogram #distances

**Christof Schöch** @christof@fedihum.org · Oct 11, 2023

Oct 11, 2023

Christof Schöch @christof@fedihum.org

Very happy to participate in today's workshop on "Potentials and Limits of #Stylometry for Early Modern Text in #Romance Languages". It's co-organized by the "Pamphlets and Patrons" #PAPA project in Early Modern French History and the Trier Center for Digital Humanities @tcdh today.

The programme is here: https://tcdh.uni-trier.de/en/event/hybrid-workshop-potentials-and-limits-stylometry-early-modern-text-romance-languages

tcdh.uni-trier.deHybrid Workshop: Potentials and Limits of Stylometry for Early Modern Text in Romance Languages | KOMPETENZZENTRUM - TRIER CENTER FOR DIGITAL HUMANITIESWorkshop, 11.10.2023 - The goal of the workshop is to show practical examples of stylometric studies. Presenters are invited to share difficulties encountered, solutions that were applied, and ongoing problems. Each presentation will be followed by a discussion.

#CLS #Romanistik #Trier

Recent searches

Search options

Administered by:

Server stats:

#stylometry