This project tries to discover all TYPO3 CMS installations in the internet.
Have you ever wondered how many TYPO3 CMS installations are existing? So did I. CMSCrawler claims to have found 389,920 installations (as of November 19, 2014), builtWith claims to have found 433,853 installations (as of November 19, 2014), metagenerator.info claims to have found 52,061 installations (as of November 19, 2014).
This is a project to have reliable and provable data. And with some background-knowledge in TYPO3, identification should be done better than just examining meta tags.
|TYPO3 CMS version||number of installations||percentage|
|Total (as of December 21, 2014)||359,281||100.00%|
|TYPO3 3.6 CMS||580||0.16%|
|TYPO3 3.7 CMS||1,888||0.53%|
|TYPO3 3.8 CMS||3,033||0.84%|
|TYPO3 4.0 CMS||5,481||1.39%|
|TYPO3 4.1 CMS||13,431||3.74%|
|TYPO3 4.2 CMS||29,105||8.10%|
|TYPO3 4.3 CMS||12,717||3.54%|
|TYPO3 4.4 CMS||26,231||7.30%|
|TYPO3 4.5 CMS (LTS)||159,153||44.30%|
|TYPO3 4.6 CMS||13,583||3.78%|
|TYPO3 4.7 CMS||46,841||13.04%|
|TYPO3 6.0 CMS||6,369||1.77%|
|TYPO3 6.1 CMS||20,067||5.59%|
|TYPO3 6.2 CMS (LTS)||20,677||5.76%|
|TYPO3 7.0 CMS||125||0.03%|
Whenever using the here presented data, please mention the project ("T3census") and provide a link to this website. Thank you!
All the consumed source data is publicly available. I neither have access to according data of typo3.org infrastructure, nor have I used such data.
For a start, twitter API has been used to extract URLs amongst keyword TYPO3. Furthermore Bing API has been used with a search for TYPO3 artefacts. With the discovered hosts, CIDRs have been retrieved and are being crawled right now.
A central jobserver is used to delegate working units. Clients are connecting to this server asking to process working units (IP lookup, Host identification). Workers are processing the units and are returning results.
Analyzing a host is done in two steps. First, a chain of identification processors decides whether a TYPO3 CMS installation has been found. When successful, a chain of classification processors tries to find out the TYPO3 version in use.
Identification is done by examining DOM, response cookies and existance of file resources.
Spread the word, do not block useragent "T3census-Crawler/*" amongst your infrastructure, reduce number of items on my Amazon Wishlist or flattr me !
Thanks to Internet Census 2012 for inspiring me, to Michael Knabe for initial work and thanks to AOE media GmbH for providing a huge list of domain records.
I'm open for suggestions what might be of interest additionally.
Please contact me if you do not want your infrastructure to be scanned!
% whois t3census.info