This update contains an important bugfix to handle malformed UTF-8 in user agent strings.
This update will simply ignore any user agents with malformed UTF-8, avoiding errors when trying to send updates via the API. These user agents are invalid and so there is no point undertaking any further analysis - thus they are silently discarded.
This new version also contains an additional CLI tool for importing user agents from a text file, for testing purposes.
v6 is another major rewrite of the core functionality of the addon, aimed at improving the submission process for newly detected useragents and performance improvements.
Important for v4 users: with this release, I am deprecating the v1 API - addon versions v4.x and earlier will continue to function for a while but will then start returning 404 error codes once I turn off the v1 API. Anyone still running KnownBots v4.x should upgrade as soon as possible.
Important for v5 users: The v2 API used in addon v5.x for fetching new bots will remain operational, however, I am deprecating the email based submission system in favour of a new API based user agent submission system. After a transition period, the inbound email system will be disabled and any emails sent to the [email protected] address will bounce back as undelieverable. Anyone still running KnownBots v5.x should either upgrade, or at least disable the "Email user agents" option in the v5.x addon options.
Important for anyone upgrading to v6: the new submission system in v6 uses an authentication process to ensure only valid submissions occur. After upgrading to v6, to continue submitting new user agents for analysis, you must first configure the authentication system - it is a very simple process - see instructions on the addon page. The options for v6 have changed - you should check them after upgrading.
The new submission system in v6 utilises the XenForo customer validation API to authenticate sites when submitting agents via our new API.
To configure the API, enter the License validation token for your site, found in the XenForo customer interrface. The validation token will be sent to the XenForo customer validation API by the KnownBots system and if valid, a KnownBots API token will be generated and returned back to the requesting forum for subsequent authentication purposes.
With a validated license, the authentication process is automatic. API tokens are regenerated every 28 days and are re-authenticated automatically. Customer details are automatically purged from the KnownBots database after 30 days of inactivity (see privacy details on main addon page). Regenerating your license validation token will automatically cause API revalidation to fail and customer details to be purged - unless you re-configure the addon options with the new license validation token.
Changelog for v6:
- new CLI tool known-botsarse to parse web server log files and display detected bots
- new CLI tool known-bots:send to send newly detected user agents to the KnownBots API for analysis
- new CLI tool known-bots:check-token to validate that the API token successfully authenticates - and optionally have the system regenerate a new API token if it has expired
- [email protected] email address is deprecated and will be removed soon - emails should no longer be sent to this address
- new configuration option to "Send user agents via API", which requires configuration by entering a XenForo license validation token. New agents are sent directly via api and no longer by email
- the "Email user agents" option remains - but is used only for forum administrators to send themselves emails if they choose. Upgrading to v6 of the addon removes any reference to [email protected] from this configuration option.
- addon now uses v3 of the bot fetch API, which includes new functionality
- v2 of the bot fetch API remains operational for sites still using addon v5.x
- v1 of the bot fetch API is now deprecated and will soon stop functioning - sites still using addon v4.x should upgrade as soon as possible
- new functionality for the addon - a list of regex based ignore strings to remove malformed or obfuscated user agents from analysis. This also allows us to ignore user agents containing sql-injection and other forms of attack which typically flood a system with a large number of unique user agents in a short period of time.
- performance enhancement - we no longer do browser or ignored checks for user agents of users who are logged in. We assume that anyone logged in with a valid XenForo user id is using a valid browser. Note that bot detection is still run, just in case. This significantly reduces the amount of processing performed by the addon for valid users.
v5.0.0 is a major rewrite of the core functionality of this addon aimed at improving processing speed, bot detection sophistication and greatly enhancing our ability to identify new bots.
Note that the options have changed - so please check the options after upgrading. More information about each option is provided on the main addon page.
- major rewrite - no longer use "bot|spider|crawl" search strings and false-positive lists to identify possible bots, rely instead on search strings supplied by API to identify valid browsers and store them directly in the database rather than the SimpleCache, ready for emailing
- more complete agent reprocessing - check for valid browsers and ignored agents
- change the core userAgentMatchesRobot function to use strpos instead of preg_match, it's much faster and won't fall over with extremely high numbers of bot match strings
- allow BotFetcher to be manually configured to bypass untrusted http agent - used for testing when API source is on a .local domain. Default action remains to use the untrusted http agent to allow for proxying outbound API calls.
- change email cron to daily send
- using new v2 API from KnownBots
- replace generic bots with complex (regex) based searches
- add "Fetch new bots" button to Known Bots List in admin UI
- automatically reprocess user agents after loading new bot data
- new Cli command for reprocessing user agents, including the option to force all user agents to be reprocessed
- improvements to user agent test in admin ui to be more descriptive
- bcc additional email address to keep them private
- bugfix: don't linkify known bot list when no links supplied
This release includes additional sanity checks to prevent bad data returned from the API from breaking the forums.
If any of the data returned by the API is not in the exact format we expect, the entire download is discarded and no changes applied to the forum. An error message will be logged prompting further investigation.
After upgrading to 4.0.1, you should manually force a fetch of new API data by executing the following command from your forum root:
php cmd.php known-bots:fetch -f
KnownBots v4 is a completely new build - bots are no longer hard coded, but updated via API calls and uses the XF code cache to store bot data
- raw bot data downloaded from API is stored in internal_data/knownbots.json
- new CLI tool for manually fetching bots from API (Cron task is also provided)
- new CLI tool for manually loading bots from knownbots.json
- new CLI tool for testing user agent matches
new bots discovered in June 2021
- btbot: BT Bot
- catchbot: Catchbot
- comodospider: Comodo SSL Spider
- deepnoc: deepnoc bot (network optimized crawling)
- dispenserbot: Dispenser Dab Solver Checklinks Bot
- epicbot: Epictions EpicBot
- esperanzabot: EsperanzaBot
- fast enterprise crawler: Fast enterprise crawler 6 used by Schibsted
- fleabot: Mercadopar Fleabot
- fuseonbot: Fuseon Link Affinity Bot
- google/bot: google/bot
- greenflare seo crawler: Greenflare SEO Crawler
- gsitecrawler: GSiteCrawler
- holmes: Morfeo Holmes Bot
- infotigerbot: InfoTiger Search Engine Bot
- internet security survey bot: Internet Security Survey Bot
- jetpack-bot: JetPack Bot
- mojoo robot: Mojoo Bot
- nicecrawler: NiceCrawler
- prem.moe crawler: Prem.moe Crawler
- sayindexbot: SayIndex Bot
- sbl-bot: SoftByte Labs Bot
- siteliner: Siteliner Bot
- swjschecketbot: Swjschecketbot
- trade desk ads.txt & sellers.json crawler: Trade Desk ads.txt & sellers.json crawler
- vortex: Marty Anstey Vortex Bot
- www.hlabs.co.ke: hLabs Bot
- zspider: Red Kolibri ZSpider
new false positives:
new bots:
- spider v9 phone
- 80legs.com: 80legs Crawler
- adssellerscrawler: Ads Sellers Crawler
- apesearch crawler: ApeSearch Crawler
- aportcatalogrobot: AportCatalogRobot
- bot dns-cache.fr: bot dns-cache.fr
- crusty broad web crawler: Crusty Broad Web Crawler
- docomo: NTT Docomo Goo Bot
- dubbotbot: DubBot Bot
- electricmonk: DueDil Electric Monk Crawler
- finbot: Finbot
- flok's crawler: Flok's Crawler
- goodbot: GoodBot
- grover: Grover web crawler
- joc web spider: JOC Web Spider
- linkarchiver twitter bot: LinkArchiver Twitter Bot
- linkscrawler: LinksCrawler
- neevabot: Neevabot Search Engine Bot
- newsharecounts.com: Newshare Counts
- newslitbot: Newslit Bot
- opensearch@mpdl: OpenSearch@MPDL
- orbbot: orbbot
- pajbot1: Pajbot
- richaudience brandsafety bot: Rich Audience Brandsafety Bot
- semjibot: SemjiBot
- simplecrawler/0.1: SimpleCrawler
- snap url preview service: Snap URL Preview Service
- spider_bot/3.0: Spider_bot
- spotibobot: Spotibo bot
- synologychatbot: Synology Chat Bot
- terrawizbot: TerrawizBot
- top100.rambler.ru crawler: Top-100 Rambler Crawler
- webshopchecker bot: Webshop Checker Bot
- wi job roboter spider: Web Integration Job Robot
- woobot: WooBot
- woriobot: Zite.com WorioBot
- wp fastest cache preload bot: WP Fastest Cache Preload Bot
- xenforo: XenForo
- your bot: Your Bot
- yunsecuritybot: YunSecurityBot
- yurichevbot: Yurichev Bot
new false positives:
new bots:
- cubot; j5
- baiduboxapp
- 200pleasebot
- a8bot
- abilogicbot
- acoonbot
- adform robot
- arhpostbot
- atomseobot
- awariorendererbot
- badoobot
- bl.uk_ldfc_bot
- brobot
- charityengine bot
- charlotte
- cosmos
- coveobot
- crawlbot/1.0.0
- cxensebot
- facebot
- fandomopengraphbot
- freshpingbot
- fuelbot
- geedobot
- getlocalbot
- google-safety
- gpcsupbot
- grub-client
- gynxbot
- healrworld crawler
- hgfalphaxcrawl
- hoodle crawler
- idmarch automatic
- imrbot
- jambot
- justlocal.nl
- kantarsifomediaauditbot
- keobsbot
- keybasebot
- koepabot
- lanaibot
- landsbokasafn
- lapozzbot
- linkpulse metacrawler
- linksmanager.com_bot
- lxrbot
- mbot v
- moreoverbot
- netpeakspiderbot
- www.niraiya.com
- node/simplecrawler
- nu.marginalia.wmsa.edge-crawler
- nutchcvs
- oer commons bot
- omniexplorer_bot
- onefuncbot
- oozbot
- pickybot
- piepmatz bot
- plukkie
- pu_in crawler
- punkspider
- pwa-crawler
- reasonalbot
- revuebot
- runet-research-crawler
- screenerbot crawler
- searchenginecrawler
- sebot-wa
- seekbot
- shopwiki
- showyoubot
- siteauditbot
- sitescorebot
- spinn3r
- squirrobot
- ssl-crawler
- thinkbot
- tsmbot
- tweetedtimes bot
- ucrawl
- umichbot
- urlappendbot
- verticalleap-sitestatusbot
- webgraph
- weblinkchecker
- websquash.com
- wellknownbot
- wizenozebot
new false positives:
weekly bot updates:
- cubot r11
- spider v7 build/lmy47i
- spider v7 (MyCell Spider v7 from Bangladesh)
v3.16.0 weekly bot updates
- adbot/1.0
- ahrefssiteaudit
- amazonadbot
- amg-bot
- ampxfbot
- anybot
- aranea web-crawled corpora project
- backlinkcrawler
- botelaire
- chirpyhubbot
- cookiebot
- crawl/1.0
- dataforseobot
- diffeobot
- digitalshadowsbot
- discoverbot
- dmasslinksafetybot
- domcopbot
- echocrawl
- emeraldshield.com webbot
- historyspider
- iplogger crawler
- mohawk crawler
- mtrobot
- mxbot
- netresearchserver
- openindexspider
- pinllc search robot
- rustbot
- sellers.guide crawler by primis
- snaplocalbot
- test-deep-cocrawler
- vdo.ai bot
- wesee:search
- whizebot
- xaldon_webspider
- xovibot
- yaanibot
- yelpbot
weekly new bots:
- boitho.com-dc
- msc crawl project radboud university
- niocbot
- open web analytics bot
- quetextbot
- rc-crawler
- tokenspider
- womlpefactory
- yeti/1.0
- new Dinobot Android TV false positives:
- dinobot 4k plus
- weekly new bots:
- cis455crawler
- crystalsemanticsbot
- discoverspider
- envolk[its]spider
- geograph linkcheck bot
- gg peekbot
- iccrawler
- mybot
- psbot
- suggybot
- testbot