Bibliographic Sync in overwrite mode#547

Sometimes the exported files from search engines have truncated data. The Sync could use the data from Crossref and compare. When the existing data in the NK entry is contained in the Crossref data and the Crossref data is more comprehensive, it would be beneficial to replace the existing metadata with the retrieved data from Crossref

2 years ago

This is a great recommendation and would be a useful remedy for botched or otherwise low quality imports.

Determining “more comprehensive” can be challenging– does more comprehensive mean longer in length? Perhaps we should just offer an “overwrite” option for sync that replaces any field in the record with data available from the external service (PubMed or Crossref).

Side note: It’s pretty unlikely that Crossref will have more comprehensive records than wherever you imported records from. e.g. Crossref has abstracts for <10% of records in my experience. We will always prefer PubMed to Crossref whenever a record is available.

2 years ago

Changed the status to

Under Consideration

2 years ago

I just found the sync button in the edit mode which does overwrite, but “destroys” content if the attributes from Crossref as subformated:

e.g. DOI 10.1002/asi.24851
after pressing Sync, only the work “Abstract” was available in the abstract and this is probably due to the embeddid formating in the JSON response where abstract is encoded like this:
“abstract”:”jats:titleAbstract<\/jats:title>jats:pReviews have long been recognized as among the most important forms of scientific communication. The rapid growth of the primary literature has further increased the need for reviews to distill and interpret the literature. This review on Reviews and Reviewing: Approaches to Research Synthesis encompasses the evolution of the review literature, taxonomy of review literature, uses and users of reviews, the process of preparing reviews, assessment of review quality and impact, the impact of information technology on the preparation of reviews, and research opportunities for information science related to reviews and reviewing. In addition to providing a synthesis of prior research, this review seeks to identify gaps in the published research and to suggest possible future research directions.<\/jats:p>”. So it does not understand the substructuring of the abstract attribute

2 years ago

and by more comprehensive I mean:

it contains > 90% of the existing value
it has at least n words more (e.g. 10 words)

2 years ago

Changed the status to

Squashing Bug

2 years ago

Ah yes, I forgot that our Sync feature already does overwrite existing attributes (but leaving in place any fields that the external database does not offer itself).

That case is quite interesting. We should be properly importing the entire jats data structure, we’ll look into it immediately.

In general, we process all XML formats (which will typically be jats and html) down to plain text. I suspect in this case we are only processing the first XML node, expecting there to typically be a single node representing the abstract.

2 years ago

Indeed- this was a quick fix. You should find the jats issue resolved in our next release!

2 years ago

Changed the status to

Completed

2 years ago

One underlying issue (jats parsing) was resolved with release 1.84.0 tonight. Thanks for reporting!

2 years ago

Make a suggestion