T3census - TYPO3 CMS census

about

This project tries to discover all TYPO3 CMS installations in the internet.

Have you ever wondered how many TYPO3 CMS installations are existing? So did I. CMSCrawler claims to have found 312,955 installations (as of December 10, 2016), builtWith claims to have found 405,828 installations (as of December 10, 2016).

This is a project to have reliable and provable data. And with some background-knowledge in TYPO3, identification should be done better than just examining meta tags.

statistics (work in progress)

TYPO3 CMS version number of installations percentage
Total (as of December 17, 2016)
"oldest" datasample: Nov 25, 2016
254,671 100.00%
TYPO3 3.6 CMS 337 0.13%
TYPO3 3.7 CMS 1,232 0.48%
TYPO3 3.8 CMS 1,534 0.60%
TYPO3 4.0 CMS 2,810 1.96%
TYPO3 4.1 CMS 6,331 2.49%
TYPO3 4.2 CMS 13,610 5.34%
TYPO3 4.3 CMS 6,065 2.38%
TYPO3 4.4 CMS 11,441 4.49%
TYPO3 4.5 CMS (previously LTS, also ELTS) 75,241 29.54%
TYPO3 4.6 CMS 6,762 2.66%
TYPO3 4.7 CMS 23,034 9.04%
TYPO3 6.0 CMS 3,110 1.22%
TYPO3 6.1 CMS 9,891 3.89%
TYPO3 6.2 CMS (LTS) 74,894 29.41%
TYPO3 7.0 CMS 87 0.03%
TYPO3 7.1 CMS 144 0.06%
TYPO3 7.2 CMS 96 0.04%
TYPO3 7.3 CMS 46 0.02%
TYPO3 7.6 CMS (LTS) 8,276 3.25%
TYPO3 8.0 CMS 1 0.00%
TYPO3 8.1 CMS 3 0.00%
TYPO3 8.2 CMS 3 0.00%
TYPO3 8.3 CMS 8568 3.36%
TYPO3 8.4 CMS 1155 0.45%
Flattr this

data usage

Whenever using the here presented data, please mention the project ("T3census") and provide a link to this website. Thank you!

data sources

All the consumed source data is publicly available. I neither have access to according data of typo3.org infrastructure, nor have I used such data.

For a start, twitter API has been used to extract URLs amongst keyword TYPO3. Furthermore Bing API has been used with a search for TYPO3 artefacts. With the discovered hosts, CIDRs have been retrieved and are being crawled right now.

infrastructure and technique

A central jobserver is used to delegate working units. Clients are connecting to this server asking to process working units (IP lookup, Host identification). Workers are processing the units and are returning results.

Analyzing a host is done in two steps. First, a chain of identification processors decides whether a TYPO3 CMS installation has been found. When successful, a chain of classification processors tries to find out the TYPO3 version in use.

Identification is done by examining DOM, response cookies and existance of file resources.

applications, libraries and services in use

history

roadmap

contributions

Spread the word, do not block useragent "T3census-Crawler/*" amongst your infrastructure, reduce number of items on my Amazon Wishlist, paypal me or Flattr this me!

thanks

Thanks to Internet Census 2012 for inspiring me, to Michael Knabe for initial work and thanks to AOE media GmbH for providing a huge list of domain records.

contact

mail at Marcus Krause
slack channel #t3census

I'm open for suggestions what might be of interest additionally.

Please contact me if you do not want your infrastructure to be scanned!

legal notice

% whois t3census.info