Several noninvasive tests have been developed for diagnosing bladder outlet obstruction (BOO) in men to avoid the burden and morbidity associated with invasive urodynamics. The diagnostic accuracy of these tests, however, remains uncertain.
To systematically review available evidence regarding the diagnostic accuracy of noninvasive tests in diagnosing BOO in men with lower urinary tract symptoms (LUTS) using a pressure-flow study as the reference standard.
The EMBASE, MEDLINE, Cochrane Database of Systematic Reviews, Cochrane Central, Google Scholar, and WHO International Clinical Trials Registry Platform Search Portal databases were searched up to May 18, 2016. All studies reporting diagnostic accuracy for noninvasive tests for BOO or detrusor underactivity in men with LUTS compared to pressure-flow studies were included. Two reviewers independently screened all articles, searched the reference lists of retrieved articles, and performed the data extraction. The quality of evidence and risk of bias were assessed using the QUADAS-2 tool.
The search yielded 2774 potentially relevant reports. After screening titles and abstracts, 53 reports were retrieved for full-text screening, of which 42 (recruiting a total of 4444 patients) were eligible. Overall, the results were predominantly based on findings from nonrandomised experimental studies and, within the limits of such study designs, the quality of evidence was typically moderate across the literature. Differences in noninvasive test threshold values and variations in the urodynamic definition of BOO between studies limited the comparability of the data. Detrusor wall thickness (median sensitivity 82%, specificity 92%), near-infrared spectroscopy (median sensitivity 85%, specificity 87%), and the penile cuff test (median sensitivity 88%, specificity 75%) were all found to have high sensitivity and specificity in diagnosing BOO. Uroflowmetry with a maximum flow rate of <10 ml/s was reported to have lower median sensitivity and specificity of 68% and 70%, respectively. Intravesical prostatic protrusion of >10 mm was reported to have similar diagnostic accuracy, with median sensitivity of 68% and specificity of 75%.
According to the literature, a number of noninvasive tests have high sensitivity and specificity in diagnosing BOO in men. However, although the majority of studies have a low overall risk of bias, the available evidence is limited by heterogeneity. While several tests have shown promising results regarding noninvasive assessment of BOO, invasive urodynamics remain the gold standard.
Urodynamics is an accurate but potentially uncomfortable test for patients in diagnosing bladder problems such as obstruction. We performed a thorough and comprehensive review of the literature to determine if there were less uncomfortable but equally effective alternatives to urodynamics for diagnosing bladder problems. We found that some simple tests appear to be promising, although they are not as accurate. Further research is needed before these tests are routinely used in place of urodynamics.
Lower urinary tract symptoms (LUTS) are prevalent and bothersome in men and women of all ages. Determining whether these symptoms are due to bladder outflow obstruction (BOO) is important in determining the optimal management  . Indeed, the success rate for surgical procedures such as transurethral resection of the prostate is presumed to be superior in patients with urodynamically documented BOO. However, it is not possible to reliably diagnose BOO on the basis of clinical symptoms alone, and the gold standard for diagnosis is urodynamic assessment via a pressure-flow study. However, this is an invasive test with risks of bothersome urinary symptoms, haematuria, and urinary tract infection. Furthermore, it can be unpleasant, with considerable rates of anxiety and embarrassment  . It also requires dedicated equipment and specific expertise, and is expensive. Consequently, a number of noninvasive tests have been described to replace the pressure-flow study in diagnosing BOO in men with LUTS. The objective of this systematic review (SR) is to determine the diagnostic accuracy of noninvasive tests in diagnosing BOO in men with LUTS with reference to the gold standard of invasive urodynamics.
We used standard methods recommended by the Cochrane Methods Group for the Systematic Review of Screening and Diagnostic Tests  , Preferred Reporting Items for Systematic Reviews (PRISMA), and Standards for Reporting Diagnostic Accuracy Studies (STARD)  . The study protocol was published on PROSPERO (CRD42015019412).
An experienced research librarian collaborated in planning the search strategy. The EMBASE, MEDLINE, Cochrane Database of Systematic Reviews, Cochrane Central (Cochrane HTA, DARE, HEED), Google Scholar, and WHO international Clinical Trials Registry Platform Search Portal databases were searched up to May 18, 2016. Only English language articles were included. The detailed search strategy is described in the Supplementary material. Additional sources for articles included the reference lists of the studies included and clinical content experts (EAU Male LUTS Guideline Panel). Two reviewers (SM and RU) screened all abstracts and full-text articles independently. Disagreement was resolved by discussion; if no agreement was reached, a third independent party acted as an arbiter (AKN).
All types of studies (including at least 10 participants) assessing the diagnostic accuracy of noninvasive tests using invasive urodynamics as a reference standard were eligible.
Eligible study populations recruited adult men (≥18 yr) with LUTS (as defined by the study authors). Studies in which the proportion of men with either neurologic disease or urethral stricture was >10% were excluded.
The following noninvasive tests (ie, index tests) were eligible for inclusion. A detailed description of each index test is included in the Supplementary material.
Intravesical prostate protrusion (IPP)
Detrusor/bladder wall thickness measured on transabdominal ultrasound (DWT/BWT)
Ultrasound-estimated bladder weight (UEBW)
Near-infrared spectroscopy (NIRS)
Penile cuff test (PCT)
External condom catheter method
The primary outcome measures for diagnostic accuracy in predicting BOO were sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Secondary outcome measures included test reliability and reproducibility, adverse events, patient satisfaction, and cost effectiveness as defined by the trial authors, if reported.
The risk of bias (RoB) in the studies included was assessed using the QUADAS-2 tool  . The tool consists of four domains (patient selection, index test, reference standard, and flow of patients through the study) and timing of the index test and reference standard. RoB was assessed for each domain, and the first three domains were also assessed for concerns regarding applicability.
A list of the most important potential confounders for outcomes was developed a priori with clinical content experts (EAU Non-neurogenic Male LUTS Guideline Panel). The confounder assessment consisted of whether each prognostic confounder was considered and whether, if necessary, the confounder was controlled for in the analysis. The potential confounding factors assessed were: (1) whether indices for a pressure-flow study were determined automatically or manually; (2) whether the quality of the urodynamic study adhered to contemporaneous quality standards (International Continence Society [ICS] standards for studies from 2002 onwards; for studies before 2002, judgment was made by the reviewer and panel member).
Owing to the expected heterogeneity in the definitions, thresholds, and technical variations of the index tests, a qualitative (ie, narrative) synthesis of all included studies was planned. For studies with multiple publications, only the most up-to-date or complete data for each outcome were analysed.
Subgroup analyses were planned for the following groups if data were available: high versus low prevalence of benign prostatic enlargement (BPE); men with a high prevalence of detrusor underactivity; men with storage versus voiding LUTS; severity of LUTS; men with previous prostate surgery; men treated with medical therapy for storage and/or voiding LUTS; and risk factors for BPE (prostate-specific antigen, prostate volume, post-void residual urine).
For each study, the elements of diagnostic accuracy were determined using a two-by-two contingency table consisting of true positive (TP), false positive (FP), false negative (FN), and true negative (TN) rates based on data reported by the study authors. If there was discrepancy between the observed data (TP, FP, FN, and TN) and derived data (sensitivity, specificity, PPV, and NPV), the observed data took priority, and diagnostic accuracy elements were calculated from the observed data as reported by the authors. In addition, descriptive statistics including median, interquartile range, and range were provided for all diagnostic accuracy elements for each type of index test considered as a whole to provide a summary measure across studies. Sensitivity analysis was planned for each type of index test using the most commonly used threshold values relevant to each test only.
The study selection process is outlined in Fig. 1 . A total of 42 studies were eligible for inclusion: 41 nonrandomised experimental studies and one retrospective comparative study 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 .
The baseline characteristics for all 42 studies included are shown in Table 1 . A total of 4444 patients were recruited.
|Study||Pts ( n )||Study design||Index test||Threshold value||Blinding||BOO definition for reference standard|
|Abdel-Aal 2011 ||85||NRE||DWT||2 mm||Yes||BOOI >40|
|Combination IPP + DWT||8 mm + 2 mm|
|Aganovic 2004 ||102||NRE||Uroflowmetry||10 ml/s||NR||LPURR >2 or >3|
|LPURR>2 + URA >29|
|Q max <15 and P det Q max >50|
|Aganovic 2012 ||111||NRE||IPP||10 mm||NR||BOOI >40|
|Aganovic 2012 ||112||NRE||IPP||12 mm||NR||BOOI >40|
|Combination IPP + BOON||10 mm, −30||BOOI >40|
|BOON2||−47.4 and −50||URA >29|
|Belenky 2003 ||29||NRE||Doppler ultrasound||RI T > 0.05||Yes||BOOI >40|
|Bianchi 2014 ||48||NRE||PCT||Griffiths nomogram||No||BOOI >40|
|Botkor-Rasmussen 1999 ||29||NRE||Uroflowmetry||10 ml/s||No||BOOI >40|
|Chia 2003 ||200||NRE||Uroflowmetry||10 ml/s||Yes||BOOI >40|
|Chung 2010 ||33||NRE||NIRS pattern on free flow||Downward pattern||No||BOOI >40|
|NIRS pattern on pressure-flow study||Downward pattern|
|Dicuio 2005 ||25||NRE||IPP||10 mm||No||DAMPF score|
|El Saied 2013 ||50||NRE||DWT||2 mm||Yes||BOOI >40|
|Prostate volume||25 ml|
|Franco 2010 ||100||NRE||IPP||12 mm||Yes||BOOI >40|
|Prostate height||40 mm|
|Prostate volume||38 ml|
|Griffiths 2005 ||144||NRE||PCT||Griffiths nomogram||No||BOOI >40|
|Han 2011 ||193||NRE||Corrected UEBW (UEBW/BSA)||27.86 g||NR||BOOI >40|
|Harding 2004 ||101||NRE||PCT||PCR index 160%||Yes||BOOI >40|
|Hirayama 2002 ||36||NRE||Uroflowmetry||10 ml/s||NR||BOOI >40|
|Kazemeyni 2015 ||51||NRE||PCT||Griffiths nomogram||NR||BOOI >40|
|Keqin 2007 ||206||RS||IPP||8.5||NR||BOOI >40|
|Kessler 2006 ||102||NRE||DWT||1.5, 2, 2.5, 2.9 mm||No||BOOI >40|
|Kojima 1997 ||65||NRE||UEBW||35 g||No||BOOI >40|
|Ku 2009 ||212||NRE||Uroflowmetry||10, 12, 15 ml/s||No||BOOI >40|
|Residual fraction||10%, 20%, 30%|
|Kuo 1999 ||324||NRE||Uroflowmetry||10 ml/s||No||P det Q max >50|
|Lim 2006 ||95||NRE||IPP||10 mm||NR||BOOI >40|
|Prostate volume||40 ml|
|MacNab 2008 ||55||NRE||NIRS||NIRS algorithm||No||Not defined|
|Madersbacher 1997 ||253||NRE||Uroflowmetry||5 ml/s||No||LPURR >2|
|Manieri 1998 ||170||NRE||BWT||5 mm||Yes||URA >29|
|Matulewicz 2015 ||19||NRE||PCT||Modified ICS nomogram||No||NR|
|Oelke 2002 ||70||NRE||DWT||2 mm||NR||CHESS|
|Oelke 2007 ||160||NRE||DWT||2 mm||Yes||BOOI >40|
|Uroflowmetry||10 and 15 ml/s|
|Ozawa 2000 ||22||NRE||Doppler ultrasound||VR >1.6||NR||BOOI >40|
|Pascual 2011 ||39||NRE||MLL||10.5 mm||No||BOOI >40|
|Pel 2002 ||56||NRE||External condom catheter||Q max /P ext max||No||BOOI >40|
|Poulsen 1994 ||153||NRE||Uroflowmetry||10 ml/s||No||BOOI >40|
|Reis 2008 ||42||NRE||IPP||10 and 5 mm||Yes||BOOI >40|
|Reynard 1996 ||148||NRE||Uroflowmetry||10 ml/s 1st void||No||BOOI >40|
|Uroflowmetry – multiple||10 ml/s 4th void|
|Reynard 1998 ||897||NRE||Uroflowmetry||10 ml/s||No||Shafer nomogram|
|Salinas 2003 ||93||NRE||PCT||Nomogram described||Yes||BOOI >40|
|Stothers 2010 ||64||NRE||NIRS||CART model||No||BOOI >40|
|Sullivan 2000 ||90||NRE||Penile compression release||PCR 100%||NR||VPG >5 cm H 2 O|
|Watanabe 2002 ||51||NRE||Prostate volume and H:W||30 ml and 0.8||No||LPURR ≥3|
|Yurt 2012 ||53||NRE||NIRS||NIRS algorithm||No||BOOI >40|
|Zhang 2013 ||87||NRE||NIRS||NIRS algorithm||Yes||BOOI >40|
|Uroflowmetry + PVR||10 ml/s and 100 ml|
A summary of the methodological quality and RoB assessments is shown in Figure 2 . The majority of studies had low RoB in terms of applicability, with most studies including men representative of those who would be expected to undergo this test in routine practice. The study by Botkor-Rasmussen et al  included a larger proportion of asymptomatic or minimally symptomatic men compared to the other studies, and Sullivan et al  included some normal volunteers, which could therefore affect the applicability of the accuracy results obtained. Hirayama et al  included only men with a small prostate (<20 ml) which is not a representative sample of those who would receive the test in clinical practice, and Kuo et al  used a definition of BOO on urodynamics (detrusor pressure at peak flow rate >50 cm H 2 O) that is not widely accepted and therefore may affect the accuracy of the results.
The principal source of bias across studies was related to reporting of the reference standard. Although the ICS nomogram is now widely accepted to define BOO on voiding cystometry, a number of studies used different definitions of BOO which may affect the diagnostic accuracy results obtained. Furthermore, some studies classified both equivocal and non-obstructed patients into the same non-obstructed group which may introduce an element of bias into the overall results  . In addition, blinding to the index test and reference standard was either not clearly discussed or was not performed in a number of studies, again accounting for an unclear or high RoB in data interpretation across studies. In the studies assessing NIRS, the index test and reference standard had to be undertaken simultaneously and so this introduces a RoB with the same investigator analysing the results of both tests at the same time.
The overall RoB across most domains was generally low across most studies, although there was significant heterogeneity of definitions of thresholds, index tests and reference tests.
The individual results for each study, organised according to the index test being assessed, are shown in Table 2 . The overall results for each type of index test considered are available in Tables 3 and 4 . It was not possible to perform subgroup analyses because of a lack of data.
|Study||Threshold value||BOO definition for reference standard||Mean age yr (range)||Mean IPSS (range)||BOOP (%)||SSY (%)||SPY (%)||PPV (%)||NPV (%)|
|Penile cuff test|
|Bianchi ||Griffiths nomogram||BOOI >40||61.5||NR||44||100||63||67.7||100|
|Griffiths ||Griffiths nomogram||BOOI >40||NR||NR||39||64||81||68||78|
|Kazemeyni ||Griffiths nomogram||BOOI>40||66.5||NR||35||88.89||75.7||66.7||93|
|Harding ||PCR index 160%||BOOI >40||63 (20–88)||NR||28||78||84||69||NR|
|Matulewicz ||Modified ICS nomogram||NR||NR||16 (6–30)||NR||75||66||92||NR|
|Salinas ||Nomogram described||BOOI >40||54.1||NR||28||100||55.6||71.4||100|
|Sullivan ||PCR 100%||VPG >5 cm H 2 O||NR||NR||48||90.7||70.2||73.6||89.2|
|Aganovic ||10 ml/s||LPURR >2||64.68||14.48||63||63||88||94||42|
|LPURR >2 + URA >29||72||92||94||68|
|Q max <15 + P det Q max >50||67||45||50||63|
|Botkor-Rasmussen ||10 ml/s||BOOI >40||66 (51–85) a||DAN-PSS 4||52||33||100||100||58|
|Chia ||10ml/s||BOOI >40||64.6 (50–94)||20.3||63||90||48||74||75|
|Dicuio ||10 ml/s||DAMPF score||67.9 (47–86)||22.4 (6–35)||64||NR||NR||100||NR|
|El Saied ||10 ml/s||BOOI >40||61.7 (53–76)||13.4 (4–22)||46||100||37||57.5||100|
|Griffiths ||10 ml/s||BOOI >40||NR||NR||39||59||89||77||77|
|Harding ||10 ml/s||BOOI >40||63 (20–88)||NR||28||81||64||51|
|Hirayama ||10 ml/s||BOOI >40||67.7 (50–83)||17.1 (9–33)||60||NR||NR||65||NR|
|Ku ||10 ml/s||BOOI >40||68 (44–89) a||18.1 (no BOO)||27||57.9||65.8||38.4||81|
|12 ml/s||19.7 (BOO)||77.2||54.2||38.3||86.6|
|Madersbacher ||5 ml/s||LPURR >2||66.5 (53–81)||16||53||16||96||85.1||46.9|
|Oelke ||15 ml/s||CHESS||63 (42–82)||14.4 (2–29)||47||100||25||55||100|
|Oelke ||15 ml/s||BOOI >40||62 (40–89) a||15 (2–30) a||47||99||39||59||97|
|Poulsen ||10 ml/s||BOOI >40||68 (32-90)||10 (no BOO) e||65||68.7||57.4||74.7||50|
|15 ml/s||11 (BOO) e||89.9||31.5||70.6||62.9|
|Reynard ||10 ml/s||Shafer nomogram||66.5 (45–88)||NR||60||47||70||70||46.5|
|Reynard ||10 ml/s 1st void||BOOI >40||NR||NR||61||71||71||79||61|
|10 ml/s 4th void||29||96||93||47|
|Abdel-Aal ||2 mm||BOOI >40||58.7 (50–72)||12.45 (6.5–25)||30||65.7||76||65.7||76|
|El Saied ||2 mm||BOOI >40||61.7 (53–76)||13.4 (4–22)||46||82.7||92.6||90.5||86.2|
|Franco ||6 mm||BOOI >40||67 (48–80)||15 (9–25)||76||73||82||90||50|
|Kessler ||1.5 mm||BOOI >40||67 (59–77) a||17 (no BOO) a||60||100||15||64||100|
|2 mm||22 (BOO) a||92||68||81||85|
|Oelke ||2 mm||CHESS||63 (42–82)||14.4 (2–29)||47||63.6||97.3||95.5||75|
|Oelke ||2 mm||BOOI >40||62 (40–89) a||15 (2–30) a||47||83||95||94||86|
|Aganovic ||5 mm||BOOI >40||65.4 (48–82)||18.2 (6–31)||49||64.5||59.2||NR||NR|
|Manieri ||5 mm||URA >29||64.5 (34–88)||14.91 (0–29)||57||55.4||91||87.9||63.4|
|Han  b||27.86 g||BOOI >40||63.5||19.9||26||61.9||59.8||33.8||82.6|
|Kojima ||35 g||BOOI >40||71 (45–89)||NR||52||85.3||87.1||87.9||84.4|
|Pel  c||Q max /P ext max||BOOI >40||62 (no BOO)||NR||29||90.9||92.3||96.7||80|
|Aganovic ||10 mm||BOOI >40||65.4 (48–82)||18.2 (6–31)||49||59.6||81.4||73.8||69.6|
|Chia ||10 mm||BOOI >40||64.6 (50–94)||20.3||63||76||92||94||69|
|Dicuio ||10 mm||DAMPF score||67.9 (47–86)||22.4 (6–35)||64||NR||NR||100||NR|
|Lim ||10 mm||BOOI >40||66 (52–88) a||12 (1–32) a||49||46||65||72||46|
|Reis ||10 mm||BOOI >40||64 (56–73)||13 (6–20)||48||80||68.2||69.6||78.9|
|5 mm||BOOI >40||95||50||63.3||91.7|
|Abdel-Aal ||8 mm||BOOI >40||58.65 (50–72)||12.45 (6.5–25)||30||80||80||73.7||85.1|
|Aganovic ||12 mm||BOOI >40||65.3 (48–80)||18.2 (6–31)||NR||59.6||81.3||73.8||69.6|
|Franco ||12 mm||BOOI >40||67 (48–80)||15 (9–25)||76||65||77||88||47|
|Keqin ||8.5 mm||BOOI >40||71 (55–84)||16.8 (G1–2 IPP)||NR||75||82.6||NR||NR|
|18.6 (G3 IPP)|
|Pascual ||10.5 mm||BOOI >40||61.6 (BOO)||14.7 (BOO)||54||90.5||72.2||76||85|
|64.7 (no BOO)||13.7 (no BOO)|
|Belenky ||RI T >0.05||BOOI >40||65.6 (46–76)||NR||75||NR||NR||95||57|
|Ozawa ||VR >1.6||BOOI >40||NR||NR||60||NR||NR||100||NR|
|El Saied ||25 ml||BOOI >40||61.7 (53–76)||13.4 (4–22)||46||87||29.6||51.3||72.7|
|Franco ||38 ml||BOOI >40||67 (48–80)||16 (9–25)||76||72||61||84||44|
|Lim ||40 ml||BOOI >40||66 (52–88) a||12 (1–32) a||49||51||38||65||42|
|Watanabe  d||30 ml and H:W = 0.8||LPURR ≥3||66.4 (49–84)||NR||47||42||100||NR||NR|
|Franco ||40 mm||BOOI >40||67 (48–80)||16 (9–25)||76||68||54||82||48|
|MacNab ||NIRS algorithm||Not defined||67.3 (50–91) (BOO)||20.2 (no BOO)||49||85.71||88.89||88.89||85.71|
|56.8 (40–77) (no BOO)||19.6 (BOO)|
|Yurt ||NIRS algorithm||BOOI >40||58.8||17.8||55||86||87.5||89.2||84|
|Zhang ||NIRS algorithm||BOOI >40||68.5 (56–85)||NR||72||68.3||62.5||82.7||42.9|
|Chung ||DP on free flow||BOOI >40||67||19||79||34.6||42.9||69.2||15|
|DP on pressure-flow study||BOOI >40||NR||NR||79||61.1||40||78.6||22.2|
|Stothers ||CART model||BOOI >40||62 (49–91)||19 (12–34)||47||100||87.5||93.8||100|
b Corrected UEBW (UEBW/body surface area).
c Results are based on the 46 out of 75 patients (61.3%) who were able to successfully perform the noninvasive test.
d Prostate volume and height:weight (H:W) ratio.
e Danish Prostatic Symptom Score.BOOI = bladder outflow obstruction index; BOOP = BOO prevalence; BWT = bladder wall thickness; CART = classification and regression tree; DP = downward pattern; DWT = detrusor wall thickness; DAMPF = detrusor-adjusted mean PURR factor; ECC = external condom catheter; G1–2 = grade 1–2; G3 = grade 3; IPP = intravesical prostatic protrusion; IPSS = International Prostate Symptom Score; LPURR = linear passive urethral resistance relation; NIRS = near-infrared spectroscopy; NR = not reported; NPV = negative predictive value; PCR = penile compression ratio; PPV = positive predictive value; RI = resistive index; SPY = specificity; SSY = sensitivity; UEBW = ultrasound-estimated bladder weight; URA = urethral resistance algorithm; VPG = voiding profilometry gradient across the bladder neck and prostatic urethra in the absence of distal obstruction; VR = velocity ratio.
|Test||Studies||Pts||Sensitivity||Specificity||Positive predictive value||Negative predictive value|
|( n )||( n )||Median (IQR)||Range||Median (IQR)||Range||Median (IQR)||Range||Median (IQR)||Range|
|Penile cuff test||7||546||88.89 (76.5–95.3)||64–100||70.2 (64.5–78.3)||55.6–84||69 (67.9–72.5)||66.7–92||93 (89.2–100)||78–100|
|Uroflowmetry||16||2580||72 (58.4–89.9)||16–100||64 (38.5–81)||25–100||70 (57.5–79)||32.5–100||70 (57.7–85.2)||46.5–100|
|DWT||8||848||69 (64–82.8)||43–100||88 (72–93.8)||15–100||89.5 (82.7–93.1)||64–100||75.5 (63.8–85.7)||50–100|
|Bladder weight||2||258||73.6||61.9–85.3||73.45||59.8– 87.1||60.85||33.8–87.9||83.5||82.6–84.4|
|IPP||10||1013||75.5 (60.9–80)||46–95||78.5 (69.2–81.3)||50–92||73.8 (72.4–85)||69.6–100||69.6 (69–85)||46–85.1|
|Doppler US||2||51||No data||No data||No data||No data||97.5 (96.2–98.7)||95–100||57||No data|
|Prostate volume||3||245||72 (61.5–79.5)||51–87||38 (33.8–49.5)||29.6–61||65 (58.1–74.5)||51.3–84||44 (43–58.3)||42–72.7|
|NIRS||5||282||85.71 (68.3–86)||61.1–100||87.5 (62.5–87.5)||40–87.5||88.89 (82.7–89.2)||78.6–93.8||84 (42.9–85.71)||22.2–100|
|Test||Threshold value||Studies||Pts||Sensitivity||Specificity||Positive predictive value||Negative predictive value|
|( n )||( n )||Median (IQR)||Range||Median (IQR)||Range||Median (IQR)||Range||Median (IQR)||Range|
|PCT||Griffiths NG||3||243||88.9 (76.4–94.4)||64–100||75.7 (69.3–78.3)||63–81||67.7 (67.2–67.9)||66.7–68||93 (85.5–96.5)||78–100|
|UF||10ml/s||13||2257||68.3 (55.1–74.2)||29–100||70.5 (62.3–89.7)||37–100||74.3 (66–89.5)||38.4–100||68 (54–76)||46.5–100|
|DWT||2mm||5||467||82.7 (65.7–83)||63.6–92||92.6 (76–95)||68–97.3||90.5 (81–94)||65.7–95.5||85 (76–86)||75–86.2|
|IPP||10mm||5||473||67.8 (56.2–77)||46–80||74.8 (67.4–84)||65–92||73.8 (72–94)||69.6–94||69.3 (63.2–71.9)||46–78.9|
|NIRS||NIRS algorithm||3||195||85.71 (77–85.8)||68.3–86||87.5 (75–88.1)||62.5–88.9||88.89 (85.7–89)||82.7–89.2||84 (63.4–84.8)||42.9–85.71|
Seven studies investigated the diagnostic accuracy of PCT. Overall, the diagnostic accuracy was high, with median sensitivity of 88% and specificity of 70%. There was low RoB across most studies, but significant heterogeneity in the threshold values used to diagnose BOO, with three studies using the nomogram developed by Griffiths et al [11 18 22] , two using different nomograms [32 42] , and two using a penile urethral compression-release index of either 160% or 100% [20 44] . As a result, it is impossible to reliably pool the results of these studies.
Uroflowmetry was assessed in 2580 patients across 16 studies. Thirteen studies used a cutoff value of 10 ml/s to diagnose BOO and reported median sensitivity of 68.3%, specificity of 70.5%, a PPV of 74.3% and NPV of 68% [7 12 13 15 16 18 20 21 26 30 33 34 38 40 41] . However, studies varied considerably in their choice of defining variable and cutoff values. The range of sensitivity and specificity values across studies was so wide that no conclusions can be drawn. As would be expected, lowering the cutoff value for maximum flow (Q max ) seemed to increase sensitivity at the expense of specificity, and vice versa. Baseline symptom severity is also a significant confounder that we are unable to control for with the available data. Overall, the diagnostic accuracy of uroflowmetry in diagnosing BOO appears to be relatively limited compared to the other index tests.
DWT was studied in 848 patients across eight studies [6 8 16 17 24 31 33 34] , five of which used a cutoff of 2 mm to define BOO, with high median sensitivity of 82.7%, specificity of 92.6%, a PPV of 90.5%, and NPV of 85%. Furthermore, a well-conducted exploratory study reported a cutoff value of 2.9 mm as having the best diagnostic value, with specificity of 100%. Altered DWT and BWT may have a multifactorial basis, and further assessments in well-designed statistically powered trials are needed to assess wider application in clinical service delivery.
UEBW was only assessed in two studies, both utilising different threshold values to define BOO, and both finding a wide variation in diagnostic accuracy [19 25] . Therefore, little inference can be made from the available data on bladder weight.
The external condom catheter method was assessed in a single study, which reported that up to 73% of patients could be correctly diagnosed with this technique  . However, from the limited data available it appears that test failure for various reasons is a limiting factor.
IPP was studied in a total of 1013 patients across ten studies [6 8 9 13 15 17 23 28 36 39] . Five studies used a cutoff of 10 mm to define BOO, and overall reported similar diagnostic accuracy to uroflowmetry alone, with median sensitivity of 67.8% and specificity of 74.8%, a PPV of 73.8% and NPV of 69.3%. However, threshold values varied, making interpretation difficult.
Two studies evaluated the role of Doppler ultrasound, one of which assessed detrusor blood flow and the other assessed urinary flow velocity [10 35] . Owing to small patient numbers, the results on Doppler ultrasound are difficult to interpret with any degree of certainty.
NIRS was assessed in five studies, three of which used the NIRS algorithm to define BOO [14 29 43 46 47] . Overall, diagnostic accuracy was relatively high, with median sensitivity of 85.7% and specificity of 87.5%. The one study using a mathematical modelling and regression tree algorithm showed the highest diagnostic accuracy  .
Secondary outcomes were not addressed owing to a lack of data suitable for a critical analysis.
A total of 42 studies recruiting 4444 patients were eligible for inclusion in this SR, which assessed the diagnostic accuracy of nine noninvasive tests. There were significant variations among studies investigating the same test, in terms of both the threshold value used to define BOO on the noninvasive test and the nomograms used to diagnose BOO on invasive urodynamics. Studies reporting on the most commonly used thresholds to define BOO — PCT using the Griffiths nomogram, DWT >2 mm, and the NIRS algorithm — had the highest median sensitivity, ranging from 82% to 85.7%. These three tests also had the highest median NPVs of 84–89%. The highest median PPVs were reported for DWT >2 mm and the NIRS algorithm, at ∼90%. The diagnostic accuracy for IPP >10 mm was similar to that for Qmax <10 ml/s on free-flow rate testing. The studies on IPP also appeared to show that specificity increased with the IPSS score, a confounder that would be controlled for in a good prospective trial. The diagnostic ability of the external condom catheter seems promising in the only study included, but these data require further validation in future studies.
Although the overall RoB was low across many domains for the majority of studies, in many studies, the index test and reference standard were performed unblinded, and in some studies the two tests were performed simultaneously by the same investigator who also analysed the results obtained. This could have potentially biased the interpretation of the findings and final conclusions reached.
Pressure-flow studies for evaluation of men with LUTS are often not performed for practical reasons. Several noninvasive techniques have therefore been developed and appear promising in the assessment of men with LUTS. From the evidence reviewed in this paper, PCT, DWT, UEBW, and NIRS show the greatest diagnostic accuracy, although further validation in studies with more stringent methodological standards are required before they can replace invasive urodynamics. Furthermore, a number of factors need to be considered when discussing the generalisability and delivery costs of these tests. PCT may cause discomfort or urethral bleeding, although this has been reported in only 2% of patients, and it has been reported that technical difficulties result in exclusion rates of 23–46% [18 48] . Similarly, the external condom method may also cause discomfort and results may be affected by low flow rates, low voided volumes, and abdominal straining  . Measurement of DWT and UEBW require specific training and there is a risk of observer error, and NIRS requires specialised equipment. Doppler ultrasound urodynamics suffers from the same limitations of observer error and requires specialised equipment. It is clear that these techniques, either alone or in combination, may be used to aid in decision-making and counselling when evaluating men with LUTS in daily clinical practice, especially if invasive urodynamics are unavailable or contraindicated. However, the quality of the current data is insufficient to recommend the routine use of any noninvasive test over pressure-flow studies in diagnosing BOO in men with LUTS.
A number of studies reviewing the evidence for various noninvasive urodynamic tests have been published in recent years 49 50 51 52 53 54 55 56 . All reviews have reported similar findings to the present review: some noninvasive tests appear promising, especially in combination, but further investigation is required before they can replace invasive urodynamics. Importantly, however, the methodology in these reviews differs significantly from the present SR. Primarily, this SR is based on strict inclusion and exclusion criteria with input from a multidisciplinary expert panel to inform the review question. The robust methods used to synthesise the evidence and analyse the data are the principal strengths of this study and therefore provide a more accurate evaluation of the available evidence compared to the other reviews.
This review has demonstrated that several noninvasive tests seem promising in assessing men with BOO. However, we have highlighted the limitations of the current evidence base in terms of heterogeneity of definitions and threshold values used. Therefore, larger studies with more stringent methodological standards are required to better assess their role in the evaluation of men with LUTS. The limitations of existing individual tests have led many investigators to assess the role of a combination of tests in improving diagnostic accuracy for BOO. Although not covered in this SR, the role of combining tests is a promising area that requires further assessment.
The strengths of this review are the systematic, transparent, and effective approach taken to examine the evidence base, including the use of Cochrane review methodology, RoB assessment using QUADAS-2, and adherence to PRISMA and STARD guidelines. The clinical question was prioritised by a multidisciplinary panel of clinical experts, methodologists, and patient representatives (EAU Non-neurogenic Male LUTS Guideline Panel), and the work was undertaken as part of the panel's clinical practice guideline update for 2016. In addition, the review elements, including participant characteristics, index and reference tests, definitions and thresholds, were developed a priori in conjunction with the panel. The search strategy was complemented by additional sources of potentially important articles, including reference lists for the studies included, and studies identified by the expert panel. This approach ensured a comprehensive review of the literature while maintaining methodological rigour and enabled the authors to put into clinical context the relevance and implications of the review findings. Moreover, the vast majority of studies were prospective in nature, with well-defined index and reference tests, and the overall RoB was generally low across studies. The primary limitation was the large heterogeneity among studies with regard to definitions of index tests and reference standards. Furthermore, owing to a lack of data we were unable to perform any subgroup analyses. Another limitation is the basic assumption that invasive urodynamics is a definitive diagnostic investigation for BOO in men. It is known that results of invasive urodynamics and the nomograms based on pressure-flow studies can have significant inter- and intra-investigator variability, as well as test-retest variation [57 58] . However, in the absence of a more accurate gold standard, all studies on these diagnostic tests will continue to be compared to invasive urodynamics.
We systematically reviewed studies assessing the diagnostic accuracy of noninvasive tests in diagnosing BOO in men with LUTS using effective methods for evidence acquisition and synthesis, with input from a multidisciplinary expert panel to inform the review question and review elements. The findings and clinical relevance were interpreted using an appropriate clinical context provided by the expert panel. Overall, a number of noninvasive tests appear promising, with low RoB across most domains for the great majority of studies. Limitations of the current evidence base include heterogeneity of definitions and thresholds for index tests and reference standards, and therefore this review highlights the need for larger prospective studies with better methodological quality. In spite of these limitations, the findings from this review can help to provide clinical guidance on the accuracy of these tests in daily practice. Therefore, while several tests have shown promising results for noninvasive assessment of BOO, a pressure-flow study remains the gold standard test in determining BOO.
Author contributions: Stavros Gravas had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Gravas, Lam, Drake.
Acquisition of data: Malde, Nambiar, Umbach.
Analysis and interpretation of data: Malde, Nambiar, Umbach, Lam, Bach, Bachmann, Drake, Gacci, Gratzke, Madersbacher, Mamoulakis, Tikkinen, Gravas.
Drafting of the manuscript: Malde.
Critical revision of the manuscript for important intellectual content: Lam, Bach, Bachmann, Drake, Gacci, Gratzke, Madersbacher, Mamoulakis, Tikkinen, Stavros Gravas.
Statistical analysis: Malde, Nambiar, Umbach, Lam.
Obtaining funding: None.
Administrative, technical, or material support: Lam, Gravas.
Financial disclosures: Stavros Gravas certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: Arjun K. Nambiar has received a travel grant from Takeda. Thorsten Bach has received speaker honoraria from Cook Urology, Boston Scientific, GSK, and Richard Wolf, participates in a trial for Ipsen, and has received fellowships and travel grants from Lisa Laser. Alexander Bachmann is a company consultant for AMS, Orian Pharma, Schering, Olympus, and Caris Life, and has received speaker honoraria from AMS, Ferring, and Bayer. He participates in trials for AMS, AstraZeneca, and Pfizer. He receives travel grants and research support from AMS, AstraZeneca, and Pfizer, and participates in an AMS-sponsored speaker bureau. Marcus J. Drake has received speaker honoraria from Allergan, Astellas and Ferring. He has received grants and research support from Allergan, Astellas and Ferring. Mauro Gacci is a company consultant for Bayer, Ibsa, GSK, Lilly, Pfizer, and Pierre Frabre. He participates in trials for Bayer, Ibsa, and Lilly, and has received travel grants and research support from Bayer, GSK, and Lilly. Christian Gratzke is a company consultant for Astellas Pharma, Bayer, Dendreon, Lilly, Rottapharm-Madaus, and Recordati. He has received speaker honoraria from AMS, Astellas Pharma, Pfizer, GSK, Steba, and Rottapharm-Madaus, and travel grants and research support from AMS, DFG, Bayer Healthcare Research, the EUSP, MSD, and Recordati. Stephan Maderbacher is a company consultant for Astellas, GSK, Lilly, and Takeda, and receives speaker honoraria from Astellas, Böhringer Ingelheim, GSK, Lilly, MSD, and Takeda. Charalampos Mamoulakis is a company consultant for Astellas, GSK, and Teleflex, and has received speaker honoraria from Elli Lilly. He participates in trials for Astellas, Elli Lilly, Karl Storz Endoscope, and Medivation, and has received fellowships and travel grants from Ariti, Astellas, Boston Scientific, Cook Medical, GSK, Janssen, Karl Storz Endoscope, Porge-Coloplast, and Takeda. Stavros Gravas has received grants or research support from Pierre Fabre Medicament and GSK, travel grants from Angelini Pharma Hellas, Astellas, GSK, and Pierre Fabre Medicament, and speaker honoraria from Angelini Pharma Hellas, Pierre Fabre Medicament, Lilly, and GSK, and is a consultant for Astellas, Pierre Fabre Medicament, and GSK. Thomas B. Lam, Sachin Malde, Roland Umbach, and Kari A.O. Tikkinen have nothing to disclose.
Funding/Support and role of the sponsor: None.
Acknowledgments: Cathy Yuan performed the literature search for this study. Kari A.O. Tikkinen is grateful for grant support from the Academy of Finland (#276046), Competitive Research Funding from the Helsinki and Uusimaa Hospital District, the Jane and Aatos Erkko Foundation, and the Sigrid Jusélius Foundation. These sponsors had no role in this review.