Sorry, we don't support your browser.  Install a modern browser
This post is closed.

Sort bibliomined studies#463

It’d be nice to be able to sort by title/author.

4 years ago
Changed the status to
Under Consideration
4 years ago

Thanks for filing– can I ask, what is the main goal of sorting on the bibliomine page? Is there a task you are doing there that can’t be done from inspector once the studies are added? Let me know!

4 years ago

So that I can cross reference the references in the PDF to make sure everything went in correctly. It’s really annoying because they’re completely out of order. I’m currently ctrl+F’ing to find authors to make sure the bibliomine was accurate.

4 years ago

It’s mainly to make sure that bibliomining got everything and didn’t miss anything, which is hard to do if the order they appear in bibliomine is not the order they appear in the PDF.

4 years ago

Thanks Nicole, we’ll take a look into how feasible it would be to maintain order, as solving the root problem is more desirable. Agreed that not maintaining order is frustrating!

4 years ago

Good news: By parallelizing our requests to Crossref, we’re able to take processing times your example study (Bai 2021) from 5-10 minutes to <= 1 minutes! The breakout of where time is spent (this example has 50 references):

  • 5-10 seconds running cermine on the full text
  • ~45 seconds searching titles against crossref for each reference
  • couple seconds retrieving full bibliographic data for each reference from PubMed and/or Crossref
  • couple of seconds deduping against records already in your nest + inserting into our DB.

Previously bullet 2 was taking around 10 seconds per reference which we were running serially, = 500 seconds for the example PDF. CrossRef’s API allows a substantially higher request rate than we were making serially (50 requests/second), so this seems to be a sustainable improvement :)

Mildly critical question: Our current ordering puts highest quality/confidence results first: Pubmed (PMID), then crossref (DOI), then no external bibliographic data found. That seems optimal if you want to do a fast scan. Should we really prefer order of citation over quality/confidence of the result?

4 years ago

Oh also - we unfortunately do not maintain retrospective citation rank data, so our ability to do citation-based ordering would be limited to new imports.

4 years ago

I think that nobody is going to bother to make sure the import is correct otherwise. So, I’d make decisions on priority based on what you think is most important (ordering by quality vs. having no logical order that can be applied to the paper thereby disincentivizing people from checking imports.)

It look me way too much time to check to make sure the import was right on a paper. To see how annoying it is to check 50+ studies in the references when they’re out of order, try it to see what it’s like and do it for one of the papers in that nest.

In an ideal world, we’d have a column for confidence and could choose to sort on that and we’d have a column for author name and could choose to sort on that if we wanted. There’s value in looking at higher certainty/quality imports (not denying that), but when there is no logical order to apply against the references in the paper, you’re ctrl+f’ing every author name to see if it imported correctly. If there are 50 references and it takes 5 seconds to do this and check to make sure it was correct. It will take 5 min per PDF at least. 5 min on a task that should take <1 min is frustrating, especially since it’s very tedious.

4 years ago

No, I agree if you want to do a comprehensive check against the original publication, it’s untennable to use a different ordering. I was challenging if doing a comprehensive check is necessary. The imagined alternative is using bibliomine to dredge up as many relevant refs as possible, and not caring about comprehensivity. Maybe that’s purely imagined though :)

4 years ago
Changed the status to
In Progress
4 years ago

I am pretty type A and meticulous so I tend to want to check to make sure it worked right. But that’s me. lol

4 years ago

Cermine is far from perfect, so I appreciate that!

4 years ago

Complete with release 1.45.2. Order of references will be maintained (however, sometimes references themselves will be missing if CERMINE failed to identify the reference entirely). Thanks for recommending this needed improvement :)

I’ll also note - bibliomines should be 3-10x faster than before. We changed our CrossRef querying strategy to be parallel instead of serial, which has made a huge difference.

4 years ago
Changed the status to
Completed
4 years ago

Amazing! You knocked it out of the park per usual. Thanks, Karl!

4 years ago
1