Ingest multiple PDFs from a file library#267

It would be great if NK could look at a folder full of PDFs and match those to items in a specific nest, and if no good match is found, suggest citations to match the PDFs. Zotero has a feature you can review, https://www.zotero.org/support/retrieve_pdf_metadata.
This will help users migrate already retrieved items to NK, making it easier to jump-start a research project. Ideally, anything without a 100% match could then be manually reviewed, as is done in NK for duplicate detection.

5 years ago

Thanks Megan - agreed! See similar discussion here: https://nested-knowledge.nolt.io/190

5 years ago

Are you thinking these PDFs should be imported into your nest as new records, or only matched to existing records, as their corresponding full text?

5 years ago

If they match existing records, they should attached to those records. If they match to new records, those new citations should got to a location (maybe just screening) to confirm that the match is good.
If the match isn’t good, maybe they go into a holding pen for manual review, as is done now w/ potential dups.

5 years ago

Changed the status to

Under Consideration

4 years ago

Changed the status to

Planned

4 years ago

Changed the status to

In Progress

4 years ago

Note, we’ve added the ability to upload FT PDFs as records (under Other Sources in Literature Search), see release 1.48.0. We’re using CERMINE, which will extract bibliographic data from the PDF itself; we will lookup the record on PubMed/CrossRef and attach higher quality bibliographic data when available.

Important: this doesn’t attach FTs to existing records. That will be coming soon, however!

4 years ago

With release 1.53.0, we’ve added the ability to bulk-import FTs for existing records. The interfacce is pretty simple: multi-select PDFs from your file dialogue, we mine bibliographic data from the PDFs, and match that to your records using title & DOI.

This is positioned as a Bulk Action inside Inspector: Full Text Import. Here’s the rationale:

We keep record import/creation distinct from attaching FTs
Positioning inside Inspector allows maximal control of which records are considered for matchinig - only included records, only records without an existing FT, etc. All records may be considered trivially as well.
This is naturally paired with the Unpaywalled import option, and gives us a central location for other FT sources we may need to add in the future.

4 years ago

We also addressed a variety of other pieces of feedback related to this issue:

Speeding up the performance of our bibliographic mining tools. In our testing, we observed a 2-4x speed up.
A new progress tracker whenever your nest has a background job running (pictured - just click the spinner to see ongoing progress. The spinner will disappear when the job is complete). This is useful when you run something that just takes a while, including search, FT matching, and model training. If you’ve ever acccidentally clicked out of the search progress bar, hopefully you’ll appreciate this!
Inspector FT column + filters now provide improved audit data, specifically the origin of the record’s full text in AutoLit (pictured).

We’re appreciative of all of you who requested all of these features, and hope they lead to improvements in your workflows :)

4 years ago

Changed the status to

Completed

4 years ago

Make a suggestion