Draft

This article is a draft.
Please do not share or link to this URL until I remove this notice

2024 Analysis of Traffic to martinfowler.com

21 March 2024



Although I do keep an eye on traffic to the site, especially with newly published articles, it's been a while since I've done a proper overview of the traffic to martinfowler.com. (The previous ones were in 2014 and 2018.) So in between my editing work, I've spent some time looking through the traffic data. Currently this page summarizes what I've understood so far, but I'll be adding more sections as I probe further.

Traffic peaked around 2020

Here's the plot of monthly page views since I started to track traffic in 2011.

We see the overall traffic flow shows a steady increase from 2011 to 2020, followed by a similarly steady decrease. Traffic levels in 2023 were about the same as 2015.

If you look carefully at the plot, you might notice there are points missing in 2019. This is due to a strange phenomenon in the data I got from Google. During a few months in 2019 the traffic of some of the navigation pages mentioned in my navigation bar (eg /agile.html) shot up from around 5000 monthly views to 221,000. It only affected a handful of pages, but a jump that big throws off monthly (and yearly) totals.

Here's the totals for the year.

yearmedian monthlytotal for year
2011306,7065,334,888,024
2012356,5708,492,974,172
2013409,3009,094,607,474
2014492,16911,505,310,904
2015594,87916,060,940,907
2016628,72918,600,995,621
2017647,93624,236,927,419
2018717,88226,818,308,778
2019801,846NA†
2020759,15437,272,861,281
2021711,29234,080,092,653
2022649,33027,707,877,323
2023597,66219,093,317,894

† due to the strange google phenomenon above, the annual total for 2019 is meaningless

It's interesting to see the overall traffic, but that's only the start, and probably not the most interesting information about what's happening with the site. But to delve further, we need to understand a bit more about the general shape of the website's traffic.

Traffic drops rapidly after publication

The first, important, general pattern about this site (and I suspect most websites) is the traffic pattern for any page after it's released.

The following graph plot shows how many views an average article on the site gets on the days following its publication.

The thing to note here is the extreme drop off from the traffic after the first days. Not just is it an exponential curve, it's an exponential curve when plotted on logarithmic scale.

To plot an “average article”, I'm actually looking at the every article published since 2020, finding the page views for each day after publication, and find the median for each day.

The second thing to note is that things tend to stabilize after about six months.

Traffic is lower at weekends (and Christmas)

You'll notice that I've said weekdays above. The next graph shows why.

Here I've plotted the daily traffic, and color-coded between weekdays and weekends. The graph clearly shows how weekend traffic is much lower than weekday traffic. An example consequence of this is that if I want to compare articles for their post-launch traffic, I need to stick to weekdays, otherwise there will be strange differences between articles published on Tuesdays and articles published on Thursdays.

There's another traffic variation, which isn't obvious in the graphic above, but shows up if I just plot weekday daily traffic

Here we can see a notable drop-off at the end of year, due to the Christmas and new year holidays. This is why monthly figures for December are always lower than the rest of the year.

A small amount of articles get most of the traffic

How much traffic do pages on the site get? To start answering this question, I actually have to ask what counts as a page on the site. If I look at a list traffic counts, I'll see a lot of paths that are requested that don't lead to real pages, such as /articles/colelction-pipeline or /articles/cmljaGFyZH. Those requests are actually a remarkably large proportion of requested pages on the site. In January 2024, 936 or the 2506 paths requested for the site were unknown

Fortunately when it comes to overall traffic analysis, I don't have to worry about it that much, because the amount of requests to these paths is rather smaller

It turns out to be about 0.5%

From that I think it's reasonable to concentrate on the known paths, which in January 2024, numbered some 1570 URLs.

My usual first step when looking at something like is to plot a histogram, so I can get a sense for the frequency distribution of the data, in this case the traffic for each of these pages in the month of January.

What this tells me is that we have a really skewed distribution, skewed even though I'm plotting it here with a square root y axis. I can tell that most pages get very few views, but a few pages get a lot.

That's useful information, but I want a bit more granularity than that. I can do this by plotting the cumulative traffic - which I do by sorting the pages by how many views they get, calculating the cumulative traffic for each page, and making this plot

Each dot on this plot is a single page, towards the right of the plot all the dots squish together into a line. I can use this by following the 10% line from the y axis and seeing that it hits the squished points at roughly 520K page views. What this tells me is that top 10% of pages on the site generated roughly 520K page views. By looking at the initial dots I can see that 150K page views were generated by the top 5 pages.

Seeing this in terms of page views helps us understand the graph, but once we've got the hang of it, I find it more useful to think in terms of proportion of page views.

This is exactly the same curve, but now I read that the top 10% of pages generated just over 80% of the traffic on the site.

I don't know enough about statistics to know what how a real data scientist would react to a plot like this. It's very similar to a Cumulative Distribution Function.

I can also show the main points in a table.

proportion of pagesproportion of trafficcumulative traffic
0% 9% 57K
5%70%441K
10%82%519K
15%88%555K
20%91%576K
25%94%590K
30%95%600K

The top pages on the site

Here's a table of the top 10% of pages on the site for 2023

rankpathtotalmedianlaunch_datetype
1/51249740501nav
2/architecture/23274820512nav
3/articles/practical-test-pyramid.html183933153712018-02-14post
4/articles/microservices.html156410128892014-03-10post
5/articles/data-mesh-principles.html151849123562020-12-03post
6/bliki/CQRS.html146743120002011-07-14post
7/refactoring/14421511276nav
8/articles/micro-frontends.html141922117042019-06-10post
9/articles/2023-chatgpt-xu-hao.html14147144002023-04-13post
10/agile.html1319639579nav
11/articles/feature-toggles.html129180106342016-01-19post
12/articles/data-monolith-to-mesh.html128921105842019-05-13post
13/articles/modularizing-react-apps.html12152356272023-02-07post
14/aboutMe.html1198728308nav
15/eaaDev/EventSourcing.html9426378652005-12-12post
16/articles/patterns-of-distributed-systems/9267666142020-08-04post
17/articles/richardsonMaturityModel.html9165873902010-03-18post
18/bliki/BoundedContext.html9009175722014-01-15post
19/articles/mocksArentStubs.html8524370342007-01-02post
20/refactoring/catalog/834467102other
21/articles/injection.html7845266522004-01-23post
22/bliki/StranglerFigApplication.html6729754082004-06-29post
23/articles/branching-patterns.html6564454042020-04-20post
24/bliki/DomainDrivenDesign.html6556455042020-04-22post
25/bliki/TwoHardThings.html6553354002009-07-14post
26/articles/microservice-testing/6513652382014-11-18post
27/bliki/CircuitBreaker.html6361153002014-03-06post
28/articles/exploring-gen-ai.html6264080312023-07-26post
29/books/refactoring.html529204466other
30/bliki/ConwaysLaw.html5214735122022-10-20post
31/eaaCatalog/494134056other
32/books/eaa.html472244068other
33/microservices/4539939902015-07-08nav
34/bliki/GivenWhenThen.html4462337842013-08-21post
35/articles/on-pair-programming.html4354535402020-01-15post
36/bliki/AnemicDomainModel.html4314835422003-11-25post
37/bliki/DDD_Aggregate.html4093733342013-04-23post
38/articles/headless-component.html4013054022023-11-01post
39/tags/domain%20driven%20design.html379073901other
40/bliki/Yagni.html3537828552015-05-26post
41/articles/201701-event-driven.html3488028642017-02-07post
42/eaaCatalog/repository.html347172820other
43/articles/platform-teams-stuff-done.html3396426432023-07-19post
44/bliki/TeamTopologies.html3333323312023-07-25post
45/eaaCatalog/dataTransferObject.html329272745other
46/articles/building-boba.html3197910892023-06-29post
47/articles/continuousIntegration.html3196127792024-01-18post
48/articles/is-quality-worth-cost.html3173723982019-05-29post
49/articles/linking-modular-arch.html3102810872023-06-13post
50/bliki/TestPyramid.html3077525592012-05-01post
51/bliki/UbiquitousLanguage.html3037925502006-10-31post
52/books/301942408other
53/bliki/BlueGreenDeployment.html2978822882010-03-01post
54/articles/patterns-of-distributed-systems/two-phase-commit.html2978624222022-01-18post
55/bliki/PageObject.html2959924712013-09-10post
56/bliki/TestDouble.html2924524002006-01-17post
57/bliki/MonolithFirst.html2895623042015-06-03post
58/articles/lmax.html2853224182011-07-12post
59/articles/serverless.html2809521742016-06-15post
60/articles/break-monolith-into-microservices.html2783021842018-04-24post
61/articles/scaling-architecture-conversationally.html2767222742021-11-30post
62/articles/ship-show-ask.html2728721702021-09-08post
63/bliki/CanaryRelease.html2645521752014-06-25post
64/eaaCatalog/unitOfWork.html261822181other
65/bliki/UnitTest.html2531621242014-05-05post
66/bliki/TellDontAsk.html2475320752013-09-05post
67/bliki/TechnicalDebtQuadrant.html2435018902014-11-19post
68/articles/consumerDrivenContracts.html2416919992006-06-12post
69/bliki/ValueObject.html2343519192016-11-14post
70/bliki/Slack.html231843512023-04-04post
71/bliki/RulesEngine.html2128517502009-01-07post
72/eaaCatalog/domainModel.html212751728other
73/testing/209991763nav
74/articles/evodb.html2028716232016-05-01post
75/articles/cd4ml.html1991016202019-09-03post
76/articles/developer-effectiveness.html1942415992021-01-05post
77/articles/retrospective-antipatterns.html194223982023-02-15post
78/articles/patterns-legacy-displacement/1909815362021-07-20post
79/articles/products-over-projects.html1909215702017-11-16post
80/articles/dependency-composition.html190307722023-05-23post
81/eaaDev/uiArchs.html1884114682006-07-18post
82/bliki/PresentationDomainDataLayering.html1868713822015-08-26post
83/articles/2023-chatgpt-tech-writing.html171678882023-04-26post
84/bliki/IntegrationTest.html1700314162018-01-16post
85/bliki/ContractTest.html1669013562011-01-12post
86/articles/bottlenecks-of-scaleups/03-product-v-engineering.html1632113042022-10-10post
87/dsl.html157411300nav
88/articles/creating-integrated-tech-strategy.html157356852023-08-08post
89/bliki/TechnicalDebt.html1564213322019-05-21post
90/articles/patterns-of-distributed-systems/paxos.html1559212602022-01-05post
91/eaaCatalog/transactionScript.html154651262other
92/articles/eurogames/1515811742013-10-02post
93/eaaCatalog/serviceLayer.html149901249other
94/articles/2023-social-media.html145982612023-11-02post
95/articles/talk-about-platforms.html1456811502018-03-05post
96/refactoring/catalog/extractFunction.html145161147other
97/refactoring/catalog/replaceNestedConditionalWithGuardClauses.html136841202other
98/articles/patterns-of-distributed-systems/wal.html133921236other
99/bliki/CannotMeasureProductivity.html127855342013-08-29post
100/articles/agile-threat-modelling.html1278110672020-05-18post
101/bliki/TestDrivenDevelopment.html127528282023-12-11post
102/articles/bottlenecks-of-scaleups/04-costs.html127178622023-07-31post
103/bliki/ObjectMother.html125649962006-10-24post
104/articles/2021-test-shapes.html1251110412021-06-02post
105/bliki/CommandQuerySeparation.html1249810102005-12-05post
106/articles/xapo-architecture-experience.html124776492023-07-18post
107/articles/itsNotJustStandingUp.html1244610722016-02-21post
108/tags/application%20architecture.html124291290other
109/bliki/BranchByAbstraction.html1240110142014-01-07post
110/articles/lean-inception/1229110072017-03-15post
111/bliki/CodeSmell.html119229922006-02-09post
112/videos.html1185910142015-03-02nav
113/articles/bitemporal-history.html115188582021-04-07post
114/articles/demo-front-end.html115087412023-08-23post
115/bliki/InversionOfControl.html114519362005-06-26post
116/delivery.html11279898nav
117/eaaCatalog/dataMapper.html11191904other
118/articles/microservice-trade-offs.html109908712015-07-01post
119/bliki/PolyglotPersistence.html106808732011-11-16post
120/bliki/FlagArgument.html106238282011-06-23post
121/articles/domain-oriented-observability.html102527652019-04-02post
122/bliki/ContinuousDelivery.html101648502013-05-30post
123/books/dsl.html10123795other
124/bliki/LocalDTO.html99098252004-10-21post
125/books/patterns-distributed.html97814890other
126/bliki/SelfTestingCode.html94986522014-05-01post
127/articles/architect-elevator.html94427462017-05-24post
128/articles/patterns-of-distributed-systems/lamport-clock.html94308042021-06-23post
129/articles/gateway-pattern.html94287662021-08-10post
130/eaaDev/PresentationModel.html92147702004-07-19post
131/bliki/ParallelChange.html90597342014-05-13post
132/articles/nonDeterminism.html89707322011-04-14post
133/articles/collection-pipeline/89206702014-07-21post
134/bliki/OrmHate.html88245772012-05-08post
135/articles/access-refactoring-web-edition.html87037192018-11-21post
136/bliki/TolerantReader.html85297162011-05-09post
137/articles/replaceThrowWithNotification.html85076962014-12-09post
138/bliki/ShuHaRi.html84757142014-08-22post
139/articles/is-tdd-dead/84237062014-05-19post
140/articles/patterns-of-distributed-systems/clock-bound.html8397385other
141/bliki/HumbleObject.html83834942020-04-29post
142/bliki/DesignStaminaHypothesis.html83226782007-06-20post
143/articles/agileFluency.html83016702012-08-08post
144/data/8244666nav
145/articles/patterns-of-distributed-systems/heartbeat.html80685242020-08-04post
146/bliki/FeatureBranch.html79666702020-05-07post
147/articles/newMethodology.html78146362005-12-13post
148/articles/exploring-mastodon.html76502222022-11-01post
149/articles/class-too-large.html76476192020-04-14post
150/tags/testing.html7606645nav
151/bliki/FluentInterface.html75326982005-12-20post
152/bliki/MaturityModel.html75215682014-08-26post
153/bliki/ExtremeProgramming.html74846152013-07-11post
154/bliki/BeckDesignRules.html74616362015-03-02post
155/articles/web-security-basics.html74486222016-01-28post
156/articles/data-mesh-accelerate-workshop.html74352952023-01-05post
157/bliki/MicroservicePrerequisites.html73425642014-08-28post

Total is the total number of page views for that path in 2023. Monthly median is the median views for a month. Launch date is the is the date the article was first published. Not all pages have a launch date, often because navigation pages don't really get this kind of launch, but also because some older articles don't have publication dates assigned to them (on my todo list to fix, as I examine the place of old articles). For more on the type see the section below.

Mostly the total and monthly median track each other, but the cases where they don't are interesting, and I'll want to look at those.

Different types of pages

There are a few different types of pages on the site. I can classify them into three broad groups:

  • posts: these are the core content of site, full length articles, posts in the bliki section, infodecks, and some older pattern-oriented material.
  • navigation page (nav): pages used primarily to help the reader find posts. These are the guide pages (eg /agile.html), including the home page, and the tags pages.
  • catalog pages are pages that are summaries of material that isn't on the site, such as the refactoring catalog, books pages, etc.

There are a few pages (eg faq.html that don't quite fit into any of these, for this analysis I'm clumping them in with catalog pages.

The proportion of the overall traffic to the page looks like this.

Most of the traffic to site is to post pages, which are the ones with the real content of the site. But we can get more information by plotting the cumulative frequency curves for each of the three types.

How much does traffic vary with age

I've been working on this site for a long time, starting it in the late 1990's, not long before I joined Thoughtworks. Many of the articles, thus go back a long way. When thinking about the age of pages on the site, it doesn't make much sense to think about navigation pages, so I concentrate on the posts: articles and bliki entries. Here's a plot of posts made each year.

The date information on the very oldest material is rather fuzzy. Some posts, particularly bliki posts, don't have a date in the source. I can figure these out by looking at version control, but before 2004 I was using CVS, and didn't feel motivated to go back to CVS files to figure out the real commit dates, since they didn't survive the transition to Subversion, Mercurial, and git.

Recall the earlier graph that shows how an article's traffic drops exponentially over time. Given this, we would assume that most older articles don't generate as much traffic an newer ones, indeed I've heard anecdotally that this is commonly the case, which is they so many sites bombard us with lots of new material. I've always preferred to concentrate on material with longer lasting value, how does this work in terms of traffic?

One way to assess this is to look at articles that get a lot of traffic. In the past I've formulated “evergreen” articles as articles that have generated at least 1000 monthly pageviews for at least six months in a year. I prefer 6 months of high monthly traffic to a high yearly total, because if an article gets suddenly noticed it can generate lots of page views for just a month or two, but not sustain it.

Here's the above plot, but this time marking out these evergreen articles (in green, of course).

This shows that plenty of older articles get this kind of traffic. But what proportion of traffic goes to older articles? To get a sense of the traffic of individual articles, we can plot each post's median monthly traffic for 2023 against its publication date.

To understand what proportion of traffic goes to older articles, we can make a plot of cumulative traffic.

This traffic analysis is based on traffic for November 2023. I didn't use January 2024, because that month included publishing a rewrite of the continuous integration article, a very popular article originally written in 2006. Before the rewrite it generated a couple of thousand views per month, but when the rewrite was published it generated over thirty thousand views. I felt this would distort the picture. I then didn't use December 2023, since Decembers suffer from lower traffic due to the Christmas and New Year break.

This reads in a similar way to the earlier cumulative plots. By following the 30% proportion line, we can see that 30% of the traffic comes to posts written in the last three years.

proportion of pagesage
9%0y 0m
13%0y 0m
17%0y 4m
20%0y 7m
25%1y 10m
32%2y 11m
35%3y 3m
40%4y 3m
46%4y 6m
51%5y 9m
55%7y 6m
60%8y 5m
68%9y 8m
71%9y 10m
75%10y 6m
81%12y 4m
85%13y 8m
91%17y 1m
96%19y 5m
100%23y 4m

The rows here are roughly every 5%, but don't hit exactly the 5s because a single article can generate more than one percentage point of traffic, and thus cross the lines (something that the graph reveals). But it still tells us the 51% of post traffic comes to articles 5 years 9 months or younger.

This makes it clear how much the old articles are still read by readers today. Posts that are over five years old get about half the traffic that comes to posts on the site. Although that's the main thing to note here I should not forget that younger posts do grab a big hunk, 13% of traffic came from brand new articles.

I'm putting this together as I do more digging into the traffic, which I'm doing in the background between working on the articles. Some questions I want to examine next include

  • How does the size of an article affect the traffic patterns?
  • What does the proportion of pages correspond to in page views (i.e. how many page views does an article need to be in the top 20%?)
  • Is there a difference between articles that I write and articles by others


Significant Revisions

21 March 2024: published up to “A small amount of articles get most of the traffic”

01 February 2024: started drafting