Core smart tags slowing down searches#554

Core smart tags is a great feature addition, but is there a way to turn it off while running searches? It seems to slow down the record import (based on the progess reports in background jobs), and since the studies aren’t screened yet, most of them will likely be excluded and won’t need tags applied.

2 years ago

Good question, and we can’t turn it off on a search-by-search basis. It should be about 45 seconds per 1,000 records, and you can click out of the modal while the search is running.

Sorry for the inconvenience, we couldn’t add these without pulling in the PICOs and other data during the search itself. Thanks for the feedback, and we’ll definitely try to make searches faster in the future, but for the feature launch, it was not possible to make this optional.

2 years ago

Changed the status to

Under Consideration

2 years ago

Thanks for the feedback- there was a tradeoff in our design choice. To give some additional detail, the models computing core smart tags are relatively slow (can only process 20-25 records/sec in an economical manner). Our options were:

Wait for CSTs to complete before importing your search. This is slow, but has the advantage of always putting your nest in a “consistent” state (no guessing on when CSTs will be complete/backfilled)
- This property is especially important for rapid reviewers who may be using CSTs for screening.
Run CSTs asynchronously with your search. This is as fast as you’re used to, but has the downside of CSTs taking an arbtirary amount of time to appear in your nest, which could be frustrating.
- Note, our PICO annotation used to work this way & was a source of pain internally and frustration externally.
Allowing turning off CSTs on select searches. We didn’t like this option due to the possibility of confusion (my eyes glaze over at the thought of all the support emails like “why do I only have partial CSTs in my nest?”)

Once we get comfortable with the performance characteristics of the new service, we may be able to 2-3x parallelize the computation of CSTs.

2 years ago

Oh one other note– the release of CSTs coincided with a subtle bugfix (I forgot to include it in the release notes, correcting that now) where we weren’t generating screening probabilities for newly imported records if your nest already has a model trained. So that will add some extra post-import processing time.

2 years ago

Hi both, thanks for the info! Definitely understand the need to adapt for new features.

2 years ago

I did some experimentation on this, and it’s quite a bit slower than 25 records/sec on small searches. I dug in & found considerable slowness in one of our CST data storage operations that was taking 10+ seconds (unrelated to the computation of CSTs themselves, which are the advertised 25 rec/second). We’ll get a patch out for the storage problem (we’d typically expect write operations to be milliseconds or faster, not seconds) which hopefully alleviates some of the pain for smaller imports!

2 years ago

Performance should be substantially improved (with largest relative effect on small imports) with release 1.88.2. For reference, a single record import into a nest of 10k records went from 15 seconds (boooo) to 2-3 (hooray!); this is all-in, including CST and robot screener post-processing time.

2 years ago

Changed the status to

Completed

2 years ago

(And, as usual, thanks for calling this one out Erin!)

2 years ago

Awesome, thanks Karl! That will help a lot with adding sources from bibliomining NCTs and adding records in other sources.

2 years ago

Make a suggestion