Skip to content

Measuring Examination Quality Through Data Mining of Office Actions

Measuring Examination Quality Through Data Mining of Office Actions published on 1 Comment on Measuring Examination Quality Through Data Mining of Office Actions

The USPTO indicates that “excellence in work products” is one of the “pillars” of patent quality, but pursuing this objective has been difficult. First, the USPTO considers “work product” synonymous with the soundness of the examiner’s decision – without recognizing that the quality of the office action, as an expression of that decision, is a separate and equally crucial goal. Second, the USPTO struggles with its own perception that “quality” cannot be measured directly or specifically, and has relegated its metrics to satisfaction surveys, where reviewers rate work product quality “on a scale from 1 to 10.” The resulting metrics are so uninformative that even USPTO management declines to use them for anything except PR.

A different approach to quality assessment is available: data mining techniques can be applied to the contents of office actions, in order to identify patterns of examiner behavior that reflect problems with examination quality. Detecting and quantifying the occurrence of such behaviors – over the entire output of each examiner, as well as of art units and the USPTO as a whole – may yield specific, objective metrics of examiners’ compliance with examination policy and best practices.


The USPTO’s Office of Patent Quality Assurance has a mandate that is both crucial and daunting: the overall assessment and measurement of patent examination quality at the USPTO. The OPQA has been successful in establishing metrics of procedural compliance (e.g., timeliness and examiner output) – but metrics of more substantive aspects of examination quality (e.g., the accuracy of cited references, the completeness of the examiner’s explanation, and the consideration of the applicant’s arguments) have proven much harder to develop.

The unreliability of the OPQA’s quality metrics has been on display of late. First, the OIG report identified several serious problems, including intentional moves to inflate quality metrics to implausible levels; unreasonable tolerance for errors (including an unwritten policy of marking items as “Needs Attention” rather than as measured errors); and an intentional decision to decouple the OPQA’s findings from examiners’ performance metrics.

Second, the recent USPTO Patent Quality Summit featured a discussion of the USPTO’s efforts to achieve “Excellence in Work Product,” but the discussion was not encouraging:

  • The entire set of USPTO proposals for “excellence in work product” is directed at the quality of the examiner’s decision – i.e., “whether claims were rejected that should have been rejected.” However, the USPTO’s “work product” is not the examiner’s decision or rationale, but the office action that expresses the decision and rationale. And none of the USPTO’s proposals, not one, focus on the quality of the office action. 1

  • The USPTO’s discussion of its quality metrics focused heavily on the mathematical formula for its quality composite metric – on the scales used (“stretch goals” vs. targets), the weights of various sub-metrics on the overall metrics, etc. – but much less was said about the substance of the review. As it happens, the base inputs for this assessment are satisfaction surveys by OPQA and external reviewers – i.e., subjective opinions about the quality of the office action ranging from “1 to 10.”

    Moreover, the OPQA’s description of the rationale that reviewers are supposed to use is whether the result was correct – whether “claims indicated as allowable are in fact patentable.” 2 Notably absent was any consideration of whether the examiner adequately explained that decision in the office action.

    Unsurprisingly, the OPQA metrics have failed to reveal actual problems or trends, or to inform any solutions. On the contrary, the “quality” assessment has been crafted to produce a rosy portrayal of “quality” 3 – producing a compliance score of 97%, notwithstanding Patent Trial and Appeal Board metrics indicating that 44% of appealed cases result in the reversal of at least one basis of rejection.

  • The OPQA reviews “between 6,000 and 8,000” office actions per year… for an examining corps of 12,000 examiners, collectively issuing 1.2 million office actions per year. The OPQA can therefore hope, at best, to review one office action per examiner per year, and 0.6% of the total work product of the USPTO. In addition to questions of statistical significance, such limited evaluation cannot possibly detect many types of quality problems. For example, it is not necessarily problematic, or even incorrect, for an examiner to take “official notice” of some claim elements in one case. But an examiner who does so routinely is likely doing so as a substitute for adequate searching and examination. Yet, this behavior is not detectable through an evaluation of one office action by the examiner per year.

    Because the OPQA’s current process requires detailed review by humans, it is impossible to scale this process to provide statistically significant measurement of quality. As a result, even USPTO management chooses not to use the OPQA’s quality metrics for administrative purposes.

To be clear, I credit the OPQA for trying to provide quality metrics through a systemic process, which is certainly a challenging objective. However, the current approach to assessing examiner quality is a failure – due to a basic mis-identification of the USPTO’s actual work product, and a blind spot in the quality assessment of that item.

Instead, the USPTO needs a new quality evaluation process that is:

  • Comprehensive: A deep review of the examination behavior of every examiner, and covering a substantial portion of the output of each examiner.
  • Objective: Based not on unsubstantiated and unreviewable guesses such as ratings, but based on verifiable metrics.
  • Specific: Not focused on generalized satisfaction surveys, but identifying specific examiner behaviors that assist or detract from prosecution.
  • Economically scalable: Not requiring extensive new resources or budget to cover the output of the examining corps.

All of these goals can be met with currently available technology: the use of data-mining techniques to evaluate the contents of office actions, in order to detect specific examiner behaviors that are indicative of examination quality.

This process could be achieved as follows:

  1. Apply pattern-matching techniques to office actions in order to identify sections and boilerplate. Tag each identified item with metadata to indicate the structure and contents of the document.
  2. Identify examiner tactics and behaviors that are indicated by specific patterns arising within office actions.
  3. For each examiner, identify the incidence of each of the identified patterns within the set of office actions issued by the examiner over a specified period. Flag patterns that arise with a significant frequency. Automatically extract a small but representative sample of these portions of the office actions for review by an OPQA representative, in order to verify and provide an example of the recurring pattern.
  4. Based on this determination, assess the examiner’s proficiency according to the behaviors that the examiner does or does not exhibit. Roll up metrics for each examiner, based upon the entirety of the examiner’s work, and classify examiners according to the results of this analysis, and tie performance awards to their relative standing in the examining corps.

As for the specific patterns – here are some examples:

  • Copy-and-paste office actions.

    The problem: In response to a reply to an office action, an examiner takes the previous office action, updates the date, changes the status from Non-Final to Final or vice versa, and sends it out. It gives the impression that the examiner has not spent any significant amount of time reviewing the case, considering the applicant’s arguments, updating the search, or performing any other type of meaningful work.

    The detectable pattern: Compare the contents of each office action with the preceding office action to detect a high degree of similarity.

  • Failure to respond to the substance of the applicant’s arguments.

    The problem: An examiner may either fail (or simply refuse) to acknowledge an applicant’s argument, or may acknowledge it but dismiss it as nonpersuasive. These office actions present something like the following: “Previously cited reference (A) teaches (XYZ). Response to Arguments: Applicant’s arguments have been considered, but are unpersuasive because reference (A) teaches (XYZ).”

    These types of office actions suggest that the examiner is unwilling to consider the applicant’s counter-argument – giving the impression of being unmovable by evidence. These cases inevitably require the input of the supervisor or PTAB to resolve – and even if the examiner’s rationale is correct, the examiner’s refusal to engage the applicant’s arguments demonstrates poor customer service that erodes applicants’ trust in the fairness of the patent process.

    The detectable pattern: For non-first office actions, determine that the “Response to Arguments” section is missing, terse, or consists primarily of a verbatim recital of the applicant’s argument and boilerplate references about KSR, broadest reasonable interpretation, In re Van Geuns, etc.

  • Citing the same references repeatedly in a series of office actions, despite claim amendments and clarifying arguments.

    The problem: Often, when an applicant and examiner disagree about whether a reference contains a particular teaching, the fastest solution is for the examiner to find a different reference that teaches the same point more clearly. However, some examiners are less interested in advancing prosecution than in winning the argument, and will refuse to conduct any further research or modify the rejection. This may continue over a series of office actions, even where the applicant has offered several claim amendments and arguments to clarify the point of novelty. This tendency conveys the impression that the examiner has no interest in advancing prosecution, and that only the involvement of the SPE or PTAB will compel a different result.

    The detectable pattern: Detect a lack of changes in the rejection over an extended series of office actions.

  • Refusal to cite references with due specificity.

    The problem: 37 CFR 1.104 requires examiners to cite references such that “the particular part relied on must be designated as nearly as practicable” and to “clearly explain the pertinence of each reference.” However, this requirement is routinely ignored: many examiners cite references in a blanket manner (e.g.: a claim element is rejected in view of paragraphs 0002, 0004, 0007-0014, 0026-0039, and/or 0048-0053), where the cited portion covers a dozen columns or pages of the reference. The applicant may read the entire cited portion and not find anything resembling the claim language; in some cases, it is not even clear why the examiner cited the reference at all. It is nearly impossible for the applicant to respond effectively to this type of rejection, so these cases inevitably end up before the supervisor or appeal board.

    The detectable pattern: Examine the citation of prior art references to determine whether a specific citation is missing, or whether the citation covers an unreasonably extensive range of the reference.

  • Frequent reliance on official notice and/or the “knowledge of one of ordinary skill in the art” to gloss over defects in references.

    The problem: When presented with an argument that a reference does not teach a significant claim element, some examiners respond by simply dismissing that element as inconsequential or mundane, and may opt to take official notice of it, or treat it as having no significant weight or distinct meaning. The examiner may rely heavily on the “knowledge of ordinary skill in the art” to reject the element – thus conveying the impression that the examiner does not even consider the invention worthy of minimally adequate examination, but a trivial detail that can be disregarded.

    The detectable pattern: Determine that the rationale for a prior art rejection includes many incidence of phrases such as “knowledge of person of ordinary skill in the art” and “broadest reasonable interpretation.”

  • Low persuasiveness, as determined from applicants’ responses.

    The problem: Persuasive office actions will prompt applicants to change position, either by amending the claims in substantive ways (e.g., moving dependent claims into the independent claims, or introducing new independent claims with a different focus), or abandoning the application. Unpersuasive office actions will push applicants to maintain position, such as presenting only arguments, requesting review by a supervisor, or filing a notice of appeal. While not much can be extrapolated from the response of a particular applicant in a particular case, interesting metrics may reveal the examiner’s overall effectiveness – i.e., how often the examiner’s office actions prompt the applicant to change position.

    The detectable pattern: Classify applicants’ reactions to office actions as either changing position (significant claim amendments, or notice of abandonment) or maintaining position (argument without amendment, or clarifying amendments that only change small portions of the claim).

  • Unproductive interviews.

    The problem: Some examiners participate in applicant interviews with the objective of advancing prosecution: achieving a better understanding of the invention; comparing the invention to the references; and reaching an agreement with the applicant about productive claim language or amendments. Other examiners participate in applicant interviews primarily because they are required to do so, and do not invest significant interest in the case.

    The detectable pattern: Determine the incidence with which an applicant interview results in a change of position by either the examiner or applicant. A low incidence of changing positions may indicate that the examiner is not participating in interviews with the goal of advancing prosecution.

  • Allowance of an application after an RCE, with substantially identical claims as presented in the previous office action.

    The problem: This is a long-known trick for inflating examiner productivity, known as the “triple play” 4: an examiner who is presented with allowable claims after a non-final rejection may nevertheless issue a final rejection, only to allow the claims after the applicant files an RCE. This practice allows the examiner to collect extra production counts for the same amount of work (at the applicant’s expense, in terms of USPTO fees and lost patent term).

    The detectable pattern: Determine an allowance of claims after an RCE that were also presented, in identical or near-identical form, before the preceding final office action.

  • Consistency of production over review period.

    The problem: Both the OIG Report and earlier research indicated that deadline pressures can force examiners to make poor examination choices – some of which occur as examiners struggle to meet productivity targets near the end of a review period. This problem can be mitigated by encouraging examiners to maintain steady productivity throughout the review period, and identifying incidents where examiners routinely find themselves with a metrics deficit near the end of a review period.

    The detectable pattern: Develop a histogram of examiner’s weekly output of office actions, and calculate a metric of output consistency over the course of the review period.


These are but some of the specific examiner behaviors that are detectable through the application of data mining techniques to the contents of office actions. This analysis could be reported as a quarterly summary for each examiner, sent to the examiner’s SPE – such as the following:

Examination Quality Report for Examiner John Smith
Report date April 01, 2016
Covered review period January 01, 2016 – March 31, 2016
Summary Metrics
Metric Examiner Score Art Unit Average
Cases reviewed 45 43
Office actions issued 41 40
Allowances issued 3 4
Allowance rate 45% 55%
Applicant-initiated interviews 9 4
Examiner-initiated interviews 2 1
Cases citing a rejection under 35 USC 112 paragraph 1 10% 14%
Cases citing a rejection under 35 USC 112 paragraph 2 20% 25%
Cases citing a rejection under 35 USC 112 paragraph 6 20% 25%
Cases citing a rejection under 35 USC 101 60% 54%
Cases citing a rejection under 35 USC 102 45% 48%
Cases citing a rejection under 35 USC 103 75% 80%
Examination Quality Metrics: Productivity and Efficiency
Metric Examiner Score Art Unit Average
Objections to application title 0% 5%
Objections to specification or abstract 15% 18%
Objections to figures 20% 14%
Inclusion of corrective recommendations in objections 60% 40%
Restriction requirements

10% 12%
Average number of office actions (non-final and final) in pending cases 3 3
Cases having more than four office actions (non-final and final) 21% 15%
Repeat office actions (similarity score > 80%) 22% 14%
Repeated use of the same references despite significant claim amendments 24% 12%
Repeated use of the same references in more than two successive office actions 27% 14%
Interviews without subsequent change of position by examiner or applicant 40% 35%
Allowance after final rejection and RCE without significant amendment 14% 12%
Consistency of production over review period 75% 60%
Examination Quality Metrics: Clarity and Completeness
Metric Examiner Score Art Unit Average
Examiner interview summaries that describe substance of interview 100% 75%
References cited in blanket manner 6% 14%
Failure to respond substantively to applicant’s arguments 5% 16%
Statement of novelty included in notices of allowance 80% 55%
Examination Quality Metrics: Accuracy and Persuasiveness
Metric Examiner Score Art Unit Average
First action allowances 4% 3%
Rejections improperly made final 1 1
Rejections based upon references that clearly did not qualify as prior art 0 0
Number of references cited in more than 25% of examiner’s cases 6 5
35 USC 103 rejections citing more than three references 26% 18%
35 USC 103 rejections citing Official Notice or unsupported “ordinary skill in the art” 38% 12%
Restriction requirements traversed rather than resolved by election 10% 18%
Applicants traversing prior art without claim amendments 38% 21%
Change of position after repeat office actions without significant claim amendment 60% 23%
Reversal of examiner’s objection or rejection by primary / supervisor / director 16% 9%
Reopening of prosecution after notice of appeal 25% 12%
Reversal of examiner’s objection or rejection by Patent Trial and Appeal Board 53% 38%

The advantages of this process are numerous and compelling:

  • This report provides specific quality metrics that are indicative of the examiner’s efficiency, clarity, and accuracy – precisely the kinds of information that have escaped the USPTO’s review and administration to date.
  • This report precisely reveals the examiner’s strengths and weaknesses. For example, the examiner above demonstrates a good track record of completeness, but a poor track record of fairly considering applicants’ arguments and responding in ways that are deemed persuasive.
  • These metrics are not limited to a particular case, but cover the examiner’s entire portfolio, and reveal the examiner’s tendencies across all cases and clients. It is not a problem that an examiner relies on “broadest reasonable interpretation” in one case; it is a problem if the examiner regularly relies on this principle to gloss over omissions in references.
  • These metrics can scale to cover the entire output of the USPTO examining corps, without requiring a major increase in human labor.
  • Automatically generated metrics can reduce the dependency of the OPQA on individual surveys that are prone to bias and general subjectivity.
  • Automated processes can generate reports in a timely manner to inform supervisors and directors of currently emerging trends, rather than retrospective analyses that may only reveal a trend some months or years later.
  • The USPTO could, as a service, provide a report to applicants indicating how their applications are faring within the USPTO, both overall and in comparison with an average applicant. For example, a report may notify a client that its applications are facing a higher-than-average restriction rate, across examiners and art units, and may prompt the client to look into the problem to reduce this incidence.

In summary – the USPTO’s efforts to achieve “excellence in work product” crucially depend on specific and accurate data that it currently does not produce. But this information is attainable with high accuracy, by applying well-understood technology to content that the USPTO already has. The transition only requires a commitment by the USPTO to improve the quality of office actions; a refocusing of the OPQA to data-driven processes; and modest IT resources to implement and maintain these practices.


Notes:

  1. The total set of USPTO proposals for improving its “excellence in work product” are:
    • Current: “Allowing” applicants to include a glossary of terminology, and promising expedited examination in return.
    • Current: A “patent application alert system,” notifying the public when “applications of interest” are published, and allowing the public to submit prior art.
    • Current: Expansion of examiner training: more technical sessions to maintain examiners’ technical proficiency, and ongoing legal training.
    • Current: The Patent Prosecution Highway – an effort to inform the USPTO examiner of patentability decisions of the same application in foreign patent offices.
    • Planned: Allowing applicants to request review of cases by higher-level administrators within a technology center.
    • Planned: Automated pre-examination prior art searches.
    • Proposed: Making the substance of interviews part of the prosecution record.

    While these proposals are generally positive for patent examination, none of these proposals relate to the quality of the office action! None of these points – whether the examiner is better trained about the law and technology; whether the examiner reaches a more accurate decision based on more relevant prior art and foreign examination; whether the examiner sends out the office action faster – speaks to the clarity, completeness, and correctness of the office action.

    The USPTO’s failure to recognize the quality of the office action, as distinct from the quality of the decision, is apparent in its roster of examiner training sessions. The USPTO offers examiners training on technology and standards of law – but includes no mention of writing workshops. That is an odd omission, since the examiner’s primary output is a written document.

  2. Paula Hutzell, the Director of the OPQA, provided the following summary of the review process:

    In the case of allowances, the reviewers basically review the prosecution history and make a determination of whether all of the allowed claims are patentable. If any allowed claim is found to be non-patentable, that case is determined to be noncompliant.

    In the case of final rejections, we conduct a similar review to determine whether or not any claims indicated as allowable are in fact patentable. We look at all of the rejections in the office action to ensure that they are correct; we also look to determine whether there are any omitted rejections of any of the claims. … So, in the case of both final and non-final actions, we are looking for incorrect rejections, omitted rejections; in the case of final office actions, we also look to see if the indication of finality is correct; we also look to see whether any restriction requirements were correct. That’s the compliance review.

    The explanation focuses on the outcome – i.e., whether or not the reviewer agreed with the outcome: “claims are rejected that should have been rejected.” Notably missing from this detailed summary of the OPQA’s review is any mention of the quality of the examiner’s explanation in the office action.

    Indeed, based on the OPQA’s review process, office actions could be replaced with a checklist for each claim: “[ ] Rejected under 101 [ ] Rejected under 112p1 [ ] Rejected under 112p2 [ ] Rejected under 102/103 (references: ______) [ ] Allowed.” As long as the reviewer agrees with this outcome, the OPQA’s “quality” process seems to have been satisfied.

  3. From the OIG Report:

    We were informed that OPQA reviewers may identify, but not record, some errors. This practice is not based on written policy direction. This practice reduces our confidence in the accuracy of USPTO’s official quality metric.

    The USPTO’s Composite Quality Metric is based on OPQA’s review of examiner decisions, which in turn is dependent on the number of errors identified by reviewers. For those patent actions examined by OPQA, USPTO was unable to provide an estimate on the number of errors that were recorded as “Needs Attention” instead of as an error.

  4. Credit to Electronic and Software Patents – Law and Practice, a fine textbook by the folks at Schwegman, Lundberg, and Woessner, for documenting this practice as far back as 2000.

1 Comment

Hear, hear! Well put!

I add that much of the metadata tagging can be done by the Examiners with (almost) no extra effort by using Word styles to flag sections. I suspect some Examiners already do this to save themselves time on formatting, which makes a lot of sense.

I am also waiting for the day when the USPTO provides the Examiner’s Word documents so I don’t have to OCR the PAIR PDFs. The Office has been improving its IT systems quite a bit recently, so I am hopeful.

Leave a Reply to Chris White Cancel reply

Your email address will not be published. Required fields are marked *