>>> from hristog.thoughts import random
Data Science Sports Analytics Python Linux About
Natural Language Processing Fantasy Football Analysis
hristog hristogeoInspired by a recent characters-across-seasons-based Game Of Thrones visualization, I decided to perform a similar study on a dataset which I'd recently collected. It consists, namely, of transcripts of all episodes from all seasons of one of my favorite TV series, Friends.
The aforementioned dataset, consists of transcripts, spread across a nested structure of seasons1, episodes, and scenes. Each scene, naturally, contains a sequence of lines, by characters predominantly addressing each other. For examples, here's a simplified example of a scene:
[Scene: A Park, Alice and Bob are having a picnic.] Alice: We're quite lucky with the weather today, aren't we? Bob: Indeed, we are.
The focus of this study is to investigate two proxies of character importances, namely, total number of lines alloted, and sum of the lengths2 of all lines alloted, for each individual season.
Of course, given the raw nature of the unprocessed data, the aforementioned metrics are only approximations. Nevertheless their degree of significance is strong enough for the purposes and scope of this study.
We can clearly observe well-formed3 pairs of characters4, by as early as the end of the fourth season: Ross and Rachel, Monica and Chandler, and Joey and Phoebe; the total line length graphs reveal a further pair, formed by Joey and Chandler5, which is not clear enough given only the context of the screen-time analysis, though.
Since the beginning of the fifth season, however, when a significant event takes place in the personal life of one of the leading cast members, we see a dramatic reshuffling of the "leaderboard". In particular, the beginning of the season in question, approximately overlaps with the beginning of Jennifer Aniston's relationship with Brad Pitt6, eventually leading to their marriage, immediately after the end of season six (around halfway through 2000).
To put it mildly, as a result7, the total amount of screen time8 alloted to Aniston's character, Rachel, skyrockets for seasons 7 and 8.
Another dramatic anomaly observed in the graphs is the significantly reduced season eight screen time, alloted to Chandler (portrayed by the wonderful Matthew Perry). A quick reference check on what was going on in the actor's life around that time, reveals that he had been battling with opioid addiction (and, reportedly, entered rehab near the end of the seventh season).
The plan is to eventually release my Jupyter Notebook and source code (once I bring them to a neat enough implementation state), used for performing the analysis, on GitHub, write an accompanying blog post here to support the various implementation-design decisions, and provide minimally sufficient background for the employed technologies.
Due to poorly-formatted raw data for the second season, for the purposes of this study the corresponding episodes were excluded from the analysis. Nevertheless, I plan to customize my script further, in order to be able to subsequently include the missing data. ↩
While, line length can be viewed as a form of a proxy for total amount of screen time alloted to a given character, please, note that in this study the actual temporal dimension was not considered in any way whatsoever. For completeness, line length is defined as the total number of characters - including punctuation (which is assumed to account for additional screen time) - comprising a given character line. And line, itself, is defined as a contiguous array of sentences, alloted to a a set of characters (usually, a single character). ↩
The notion of a pair here, however, is relatively vague, given the quintessentially fluent nature of interactions taking place within the context of a sitcom environment. ↩
Of course, this is by no means an estimate of amount of intra-pair interaction between said characters. It should be thought of more as a sort of ranking, based on character importance or reputation, corresponding the particular period of the timeline of the TV series. ↩
At this point, the average die-hard fan of the show is probably reminded of quintessential moments like Joey's "No more J-man and Channie's" line, after Chandler informed Joey he'd been moving out to live together with Monica. ↩
Traditionally, just as the other various celebrities who were partners of cast members at one point or another, Brad Pitt eventually made an appearance on the show, portraying Will Colbert, who was presented as a former high school mate of Ross, Monica and Rachel. ↩
Naturally, this is only a correlation outlined by the data. This study does not mean or intend, in any way whatsoever, to underestimate Jennifer Aniston's acting skills or visual characteristics (or any other factor that might have contributed to her amazing success, for that matter). ↩
As estimated via the total number of lines, as well as the cousin statistic, total line length. ↩