Our methodology
Every score on StackArbiter comes from the same rubric, applied the same way, to every tool we test. Here's exactly how we do it — no black boxes.
How we test every tool
The same four steps, every time. No skipping, no reviewing from the brochure.
We open a real account
We sign up using a real email, go through the actual onboarding, and run realistic workflows — invoicing, project setup, reporting. We never review a tool from its marketing page alone.
We score on 6 axes
Each tool is graded 0–5 on six weighted criteria. The same rubric, the same questions, every time. We document our notes during testing so scores are traceable — not recalled from memory.
We cross-check user data
Our hands-on score is validated against aggregated user reviews from G2, Capterra, and Trustpilot. Where our experience diverges significantly from user consensus, we investigate and note why.
We name a winner
Every comparison ends with one named pick for a specific use case. If we genuinely can't separate two tools, we say so explicitly — we don't hide behind "it depends" and call it done.
We verify pricing quarterly
Prices change. We re-check every pricing page every quarter and update the scores if value-for-money shifts materially. Each page shows the last verification date so you always know how fresh the data is.
We publish corrections
When a tool changes significantly, when readers flag an error, or when a new competitor shifts our verdict — we update and note it publicly. Scores aren't set in stone.
The 6 scoring criteria
Each axis is scored 0–5. The final score is a weighted average. Weights reflect how much each factor affects real-world value for a business user.
Setup & Onboarding
How fast can a new user go from signup to first meaningful output — invoice sent, project created, report run. Complexity and friction are penalized.
- Time from signup to first useful action
- Data import tools (CSV, from competitors)
- Quality of onboarding wizard or in-app guidance
- Complexity of initial configuration
Day-to-Day UX
The heaviest-weighted axis. A tool you hate using on Tuesday will be abandoned by Thursday. We test the core daily workflows the product is built for.
- Navigation clarity and information density
- Mobile app quality (not just "mobile-friendly")
- Speed of the most common actions
- Consistency and polish across the interface
Feature Depth
Does the tool do what it claims — and does it do it completely? We test edge cases, not just the happy path. Integrations and API quality count here.
- Core feature completeness vs. category standards
- Integration ecosystem (native and via Zapier/API)
- Reporting and analytics depth
- Handling of edge cases and complex workflows
Customer Support
We test support channels directly — submitting real tickets and measuring response time, quality, and resolution. "24/7 support" claims are verified, not taken at face value.
- Available channels (chat, phone, email, community)
- Measured first-response time
- Quality of the help documentation
- Support availability by plan tier
Price-to-Value
Not "is it cheap" but "is what you get worth what you pay." Scored relative to the category average. A $200/month tool can score higher than a $20/month tool if it delivers proportionate value.
- Features per dollar vs. category median
- Free tier or trial generosity
- Pricing transparency (no hidden fees)
- Scaling cost as team or usage grows
Data Portability
Can you leave? Vendor lock-in is a real cost. We test data exports on day one — before we're emotionally invested. A tool that traps your data loses points regardless of everything else.
- Export completeness (all data, not just summaries)
- Standard formats (CSV, JSON, industry-specific)
- Migration path to common competitors
- Account deletion and data deletion process
How the final number is calculated
Each axis score (0–5) is multiplied by its weight, summed, then scaled to a 10-point final score. The formula is the same for every tool in every category.
What we will and won't do
Our affiliate relationships fund the site. Here's exactly where the line is.
What we do
- Earn affiliate commission when readers sign up through some of our links
- Disclose affiliate relationships clearly on every page where they exist
- Apply the identical scoring rubric to all tools regardless of affiliate status
- Rank tools by score — affiliate tools rank lower if they score lower
- Update scores when tools improve or deteriorate, regardless of relationship
- Accept correction requests from vendors if they include verifiable facts
What we never do
- Accept payment to improve a tool's score or ranking position
- Give a tool a higher score because it has a higher affiliate commission
- Allow vendors to see or influence our scores before publication
- Add "Editor's Choice" or similar labels in exchange for compensation
- Remove negative findings from a review at a vendor's request
- Recommend a tool we believe is genuinely worse for the reader's use case
How we keep data fresh
A review written in 2023 and never updated is a liability, not an asset. Here's our update schedule.