Content flagging and moderation

Who can use this: Org admins only.

Content flagging is a safety net. Most observations and summaries written in Performance Blocks are constructive and well-intentioned. A small percentage have problems — language that could be retaliatory, content that names third parties inappropriately, profanity that doesn't belong in a professional review. Flagging catches these before they reach the employee, and gives admins a structured way to intervene.

This article covers why flagging exists, how rules work, and the workflow for reviewing flags.

What flagging is and isn't

Flagging is:

An automated check that runs on observations and summaries when they're submitted.
A queue of items that need human review.
A workflow for approving, returning, or requesting edits on flagged content.

Flagging isn't:

A blocker on filing observations. Authors can write what they want; flagging triggers admin review, not censorship at the form.
Sentiment analysis. We're not blocking "negative" feedback — opportunity-type observations are core to the product.
A replacement for manager judgment. Most flag rules catch a narrow set of patterns; the broader judgment about whether feedback is fair lives with the manager and the cycle approval workflow.

When flags fire

Flag rules run on:

Observation submission — when a user clicks Save on a new observation.
Summary submission — when a manager moves a summary from draft to pending (or to approved if admin approval is off).
Conversation messages — when a message is sent in a conversation thread.

Flags do not run on drafts. You can write whatever you like in a draft; the check happens at the submit moment.

When a rule fires, the artifact:

Is still saved/submitted.
Enters a flagged status in the moderation queue.
Is not visible to the subject of the artifact (the employee being reviewed) until an admin resolves the flag.
May or may not be visible to the author depending on rule configuration (most rules let the author see their own flagged content; aggressive rules can hide it).

Configuring flag rules

Flag rules live in Admin → Settings → Review Flags.

Built-in rule categories

Performance Blocks ships with several built-in rule categories. Each can be enabled, tuned, or disabled.

Category	What it catches
Profanity	Common profanity in any of the supported languages
Slurs	Identity-based slurs (always strongly recommended on)
Third-party naming	Mentions of named individuals other than the subject
Retaliatory pattern	Patterns associated with retaliation (e.g., negative feedback shortly after a protected activity like filing an HR complaint)
Compensation discussion	Mentions of specific salary numbers or comp ratios in performance feedback
Legal/medical discussion	Mentions of legal proceedings, medical conditions, or accommodations

You can tune sensitivity per category — some default to "high" (most catches, more false positives), others to "low" (only obvious cases).

Sensitivity levels

Level	Behavior
Off	The rule doesn't fire
Low	Only obvious matches fire
Medium	Moderate matches fire (default for most rules)
High	Aggressive matching, more false positives, fewer misses

Higher sensitivity creates more queue volume. Start at default and adjust based on what you see.

Custom rules

You can add custom keyword-based rules:

Open Settings → Review Flags → Custom rules.
Click Add rule.
Configure:
- Name — internal label.
- Keywords — terms or phrases that trigger the rule.
- Match mode — whole word, phrase, or fuzzy.
- Sensitivity — same scale as built-in rules.
- Author visibility — whether the author sees that their content was flagged.
Save.

Use custom rules sparingly. They're easy to over-configure into a blocker on legitimate feedback. Common useful ones:

Mentions of competitor names that should not appear in performance reviews.
Internal project codenames that shouldn't leak to employees outside the project.
Specific phrases your legal team has flagged as risky.

The flagged content queue

Open Admin → Flagged Content to see the queue.

Each row shows:

The artifact (observation/summary/conversation message).
The author.
The subject (if applicable).
The rule(s) triggered.
The submitted date.

Click a row to open the full artifact with the triggering content highlighted.

Filtering

Filter by:

Rule (which rules are firing most).
Author.
Subject.
Department.
Aging (how long since flagged).

Patterns to watch for:

One author with many flags — coach them on feedback norms.
One rule firing constantly — sensitivity may be too high.
Items aging more than three days — admin coverage gap.

Reviewing a flag

When you open a flagged item, you have four actions:

Approve as-is

The content is fine despite the flag. Click Approve.

The artifact transitions to its normal published state. The employee sees it. The flag remains in the audit log so you can track how often each rule produces false positives.

Use this when the rule fired on a phrase that's contextually OK. For example, "the project budget is competitive" might trip the compensation-discussion rule but is fine in context.

Return for edit

The content needs changes. Click Return for edit, write a note explaining what to change, confirm.

The author is notified. The artifact returns to draft status. The author can edit and resubmit, which re-runs flag checks.

Use this for substantive issues that the author can fix.

Reject

The content cannot be published as written and editing wouldn't fix it. Click Reject, write a note, confirm.

The author is notified. The artifact is archived in a rejected state — preserved for audit but not published.

Use this rarely. Most issues are fixable; outright rejection is for cases like a clearly retaliatory observation that shouldn't be reframed and resubmitted.

Escalate

You're not the right person to make this call. Click Escalate, pick another admin, write a note.

The escalation target is notified. The flag stays in the queue but with you no longer as the primary reviewer.

Common escalations: legal-sensitive flags to your legal contact, retaliatory-pattern flags to HR leadership.

Author visibility

By default, when an item is flagged, the author sees a banner on their own artifact: "This is under review by an admin." They can see what they wrote; they just know it's not yet published.

For the most sensitive rules (retaliatory pattern, compensation discussion), you can hide the flag from the author entirely — they see their content as if published, but only the admin and the subject's manager can see it for real. Subject still doesn't see it until you approve.

Hiding from the author is appropriate when surfacing the flag itself would be problematic — for example, a retaliatory-pattern flag, where alerting the author might cause them to withdraw the observation strategically.

Configure per rule in Settings → Review Flags.

Audit trail

Every flag and every moderation decision is logged. The audit log captures:

When the flag fired.
Which rule(s) triggered.
Who reviewed and what action they took.
Any notes attached.

Audit log entries for flags use the action types flag.created, flag.approved, flag.returned, flag.rejected, and flag.escalated. Filter by these to audit your moderation history.

See Audit logs for general audit log usage.

Coverage and rotation

For orgs with significant flag volume, single-admin coverage is risky — vacations and time zones leave gaps. Patterns that work:

Primary + backup: assign one admin as primary, another as backup. Backup auto-takes flags aging more than 24 hours.
Department split: divide the queue by department, with one admin per region.
Round-robin: each new flag rotates through the admin pool. Lightweight; works for small admin teams.

Document your rotation. The Review queue does not enforce assignment, so coverage is by convention.

False positives and rule tuning

Expect a rate of false positives — content that triggers a rule but is fine. The right rate depends on your tolerance:

High-tolerance rules (profanity, slurs) — false positives are acceptable; you'd rather catch every real one.
Low-tolerance rules (retaliatory pattern) — false positives are costly because they delay legitimate feedback; tune carefully.

Review your false-positive rate quarterly. If a rule's false-positive rate is over 50%, consider:

Lowering sensitivity.
Adding context-aware exclusions (custom rules support phrase-level exclusions).
Disabling the rule and relying on manager judgment plus admin approval.

Privacy and discretion

Flagged content is sensitive. Treat the moderation queue with the same discretion you'd treat any HR matter:

Don't share flagged content casually with other admins or with managers.
Don't discuss flag patterns in Slack channels with broader audiences.
The audit log is queryable by other admins; assume any decision you make is reviewable.

If a flag involves potentially serious behavior (harassment, discrimination, legal exposure), escalate to People/Legal immediately. Don't try to handle it solo.

Frequently asked

Can I disable flagging entirely? Yes — in Settings → Review Flags, set every rule to Off. We don't recommend it; even a minimal config (slurs + retaliatory pattern) catches valuable cases.

Does flagging slow down feedback? Most flagged items resolve in under 24 hours. Items not flagged are unaffected.

Will Henry's outputs be flagged? Yes — Henry-generated content runs through the same rules as human-authored content. Henry is generally good about avoiding triggering language, but it's not exempt.

Can flags be used in performance reviews? No. Flag history is admin-only. We do not surface "this manager has had X flags" to anyone outside the admin team.

Can employees flag content directly? Not via this system. The intended channel for "I think this observation about me is unfair" is to talk to their manager, with skip-level or HR escalation if needed. The flag system is automated detection, not user reporting.

Next steps

Audit logs — full audit trail including all flag activity.
Running review cycles — how flagging interacts with cycle approval.
Security and IT — broader privacy and data handling.

Getting Started

For Employees

For Managers

For Admins

Henry Agent

Integrations

Security and IT

Developers

Reference