ArXiv API#
The ArXiV API allows programmatic access to the arXiv’s e-print content and metadata. “The goal of the interface is to facilitate new and creative use of the the vast body of material on the arXiv by providing a low barrier to entry for application developers.” https://arxiv.org/help/api
The API’s user manual (https://arxiv.org/help/api/user-manual) provides helpful documentation for using the API and retrieving article metadata.
Our examples below will introduce you to the basics of querying the ArXiV API.
Install Packages#
import urllib
import arxiv
import requests
import json
import csv
import pandas as pd
from collections import Counter, defaultdict
import numpy as np # for array manipulation
import matplotlib.pyplot as plt # for data visualization
%matplotlib inline
import datetime
Query the API#
Perform a simple query for “graphene.” We’ll limit results to the titles of the 10 most recent papers.
search = arxiv.Search(
query = "graphene",
max_results = 10,
sort_by = arxiv.SortCriterion.SubmittedDate
)
for result in search.results():
print(result.title)
/tmp/ipykernel_2506/3772056595.py:7: DeprecationWarning: The 'Search.results' method is deprecated, use 'Client.results' instead
for result in search.results():
Photovoltaic Performance of a Rotationally Faulted Multilayer Graphene/n-Si Schottky Junction
Weak localization and antilocalization corrections to nonlinear transport: a semiclassical Boltzmann treatment
Four Moiré materials at One Magic Angle in Helical Quadrilayer Graphene
Unusual spin-triplet superconductivity in monolayer graphene
Intermediate diffusive-ballistic electron conduction around mesoscopic defects in graphene
Giant enhancement of terahertz high-harmonic generation by cavity engineering of Dirac semimetal
Re-entrant superconductivity at an oxide heterointerface
Wave Packet Propagation through Graphene with Square and Triangular Patterned Circular Potential Scatterers
The role of stacking and strain in mean-field magnetic moments of multilayer graphene
Charge and Valley Hydrodynamics in the Quantum Hall Regime of Gapped Graphene
Do another query for the topic “quantum dots,” but note that you could swap in a topic of your liking.
You can define a custom arXiv API client with specialized pagination behavior. This time we’ll process each paper as it’s fetched rather than exhausting the result-generator into a list
; this is useful for running analysis while the client sleeps.
Because this arxiv.Search
doesn’t bound the number of results with max_results
, it will fetch every matching paper (roughly 10,000). This may take several minutes.
results_generator = arxiv.Client(
page_size=1000,
delay_seconds=3,
num_retries=3
).results(arxiv.Search(
query='"quantum dots"',
id_list=[],
sort_by=arxiv.SortCriterion.Relevance,
sort_order=arxiv.SortOrder.Descending,
))
quantum_dots = []
for paper in results_generator:
# You could do per-paper analysis here; for now, just collect them in a list.
quantum_dots.append(paper)
---------------------------------------------------------------------------
UnexpectedEmptyPageError Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:648, in Client._parse_feed(self, url, first_page, _try_index)
647 try:
--> 648 return self.__try_parse_feed(url, first_page=first_page, try_index=_try_index)
649 except (
650 HTTPError,
651 UnexpectedEmptyPageError,
652 requests.exceptions.ConnectionError,
653 ) as err:
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:689, in Client.__try_parse_feed(self, url, first_page, try_index)
688 if len(feed.entries) == 0 and not first_page:
--> 689 raise UnexpectedEmptyPageError(url, try_index, feed)
691 if feed.bozo:
UnexpectedEmptyPageError: Page of results was unexpectedly empty (https://export.arxiv.org/api/query?search_query=%22quantum+dots%22&id_list=&sortBy=relevance&sortOrder=descending&start=1000&max_results=1000)
During handling of the above exception, another exception occurred:
UnexpectedEmptyPageError Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:648, in Client._parse_feed(self, url, first_page, _try_index)
647 try:
--> 648 return self.__try_parse_feed(url, first_page=first_page, try_index=_try_index)
649 except (
650 HTTPError,
651 UnexpectedEmptyPageError,
652 requests.exceptions.ConnectionError,
653 ) as err:
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:689, in Client.__try_parse_feed(self, url, first_page, try_index)
688 if len(feed.entries) == 0 and not first_page:
--> 689 raise UnexpectedEmptyPageError(url, try_index, feed)
691 if feed.bozo:
UnexpectedEmptyPageError: Page of results was unexpectedly empty (https://export.arxiv.org/api/query?search_query=%22quantum+dots%22&id_list=&sortBy=relevance&sortOrder=descending&start=1000&max_results=1000)
During handling of the above exception, another exception occurred:
UnexpectedEmptyPageError Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:648, in Client._parse_feed(self, url, first_page, _try_index)
647 try:
--> 648 return self.__try_parse_feed(url, first_page=first_page, try_index=_try_index)
649 except (
650 HTTPError,
651 UnexpectedEmptyPageError,
652 requests.exceptions.ConnectionError,
653 ) as err:
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:689, in Client.__try_parse_feed(self, url, first_page, try_index)
688 if len(feed.entries) == 0 and not first_page:
--> 689 raise UnexpectedEmptyPageError(url, try_index, feed)
691 if feed.bozo:
UnexpectedEmptyPageError: Page of results was unexpectedly empty (https://export.arxiv.org/api/query?search_query=%22quantum+dots%22&id_list=&sortBy=relevance&sortOrder=descending&start=1000&max_results=1000)
During handling of the above exception, another exception occurred:
UnexpectedEmptyPageError Traceback (most recent call last)
Cell In[3], line 13
1 results_generator = arxiv.Client(
2 page_size=1000,
3 delay_seconds=3,
(...)
9 sort_order=arxiv.SortOrder.Descending,
10 ))
12 quantum_dots = []
---> 13 for paper in results_generator:
14 # You could do per-paper analysis here; for now, just collect them in a list.
15 quantum_dots.append(paper)
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:622, in Client._results(self, search, offset)
620 break
621 page_url = self._format_url(search, offset, self.page_size)
--> 622 feed = self._parse_feed(page_url, first_page=False)
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:656, in Client._parse_feed(self, url, first_page, _try_index)
654 if _try_index < self.num_retries:
655 logger.debug("Got error (try %d): %s", _try_index, err)
--> 656 return self._parse_feed(url, first_page=first_page, _try_index=_try_index + 1)
657 logger.debug("Giving up (try %d): %s", _try_index, err)
658 raise err
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:656, in Client._parse_feed(self, url, first_page, _try_index)
654 if _try_index < self.num_retries:
655 logger.debug("Got error (try %d): %s", _try_index, err)
--> 656 return self._parse_feed(url, first_page=first_page, _try_index=_try_index + 1)
657 logger.debug("Giving up (try %d): %s", _try_index, err)
658 raise err
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:656, in Client._parse_feed(self, url, first_page, _try_index)
654 if _try_index < self.num_retries:
655 logger.debug("Got error (try %d): %s", _try_index, err)
--> 656 return self._parse_feed(url, first_page=first_page, _try_index=_try_index + 1)
657 logger.debug("Giving up (try %d): %s", _try_index, err)
658 raise err
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:658, in Client._parse_feed(self, url, first_page, _try_index)
656 return self._parse_feed(url, first_page=first_page, _try_index=_try_index + 1)
657 logger.debug("Giving up (try %d): %s", _try_index, err)
--> 658 raise err
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:648, in Client._parse_feed(self, url, first_page, _try_index)
641 """
642 Fetches the specified URL and parses it with feedparser.
643
644 If a request fails or is unexpectedly empty, retries the request up to
645 `self.num_retries` times.
646 """
647 try:
--> 648 return self.__try_parse_feed(url, first_page=first_page, try_index=_try_index)
649 except (
650 HTTPError,
651 UnexpectedEmptyPageError,
652 requests.exceptions.ConnectionError,
653 ) as err:
654 if _try_index < self.num_retries:
File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/arxiv/__init__.py:689, in Client.__try_parse_feed(self, url, first_page, try_index)
687 feed = feedparser.parse(resp.content)
688 if len(feed.entries) == 0 and not first_page:
--> 689 raise UnexpectedEmptyPageError(url, try_index, feed)
691 if feed.bozo:
692 logger.warning(
693 "Bozo feed; consider handling: %s",
694 feed.bozo_exception if "bozo_exception" in feed else None,
695 )
UnexpectedEmptyPageError: Page of results was unexpectedly empty (https://export.arxiv.org/api/query?search_query=%22quantum+dots%22&id_list=&sortBy=relevance&sortOrder=descending&start=1000&max_results=1000)
Organize and analyze your results#
Create a dataframe to better analyze your results. This example uses Python’s vars
built-in function to convert search results into Python dictionaries of paper metadata.
qd_df = pd.DataFrame([vars(paper) for paper in quantum_dots])
We’ll look at the first 10 results.
qd_df.head(10)
entry_id | updated | published | title | authors | summary | comment | journal_ref | doi | primary_category | categories | links | pdf_url | _raw | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | http://arxiv.org/abs/cond-mat/0310363v1 | 2003-10-15 20:15:59+00:00 | 2003-10-15 20:15:59+00:00 | Excitonic properties of strained wurtzite and ... | [Vladimir A. Fonoberov, Alexander A. Balandin] | We investigate exciton states theoretically in... | 18 pages, accepted for publication in the Jour... | J. Appl. Phys. 94, 7178 (2003) | 10.1063/1.1623330 | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://dx.doi.org/10.1063/1.1623330, http://a... | http://arxiv.org/pdf/cond-mat/0310363v1 | {'id': 'http://arxiv.org/abs/cond-mat/0310363v... |
1 | http://arxiv.org/abs/2008.11666v1 | 2020-08-26 16:48:21+00:00 | 2020-08-26 16:48:21+00:00 | A two-dimensional array of single-hole quantum... | [F. van Riggelen, N. W. Hendrickx, W. I. L. La... | Quantum dots fabricated using techniques and m... | 7 pages, 4 figures | None | 10.1063/5.0037330 | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://dx.doi.org/10.1063/5.0037330, http://a... | http://arxiv.org/pdf/2008.11666v1 | {'id': 'http://arxiv.org/abs/2008.11666v1', 'g... |
2 | http://arxiv.org/abs/cond-mat/0411742v1 | 2004-11-30 02:56:15+00:00 | 2004-11-30 02:56:15+00:00 | Polar optical phonons in wurtzite spheroidal q... | [Vladimir A. Fonoberov, Alexander A. Balandin] | Polar optical-phonon modes are derived analyti... | 11 pages | J. Phys.: Condens. Matter 17, 1085 (2005) | 10.1088/0953-8984/17/7/003 | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://dx.doi.org/10.1088/0953-8984/17/7/003,... | http://arxiv.org/pdf/cond-mat/0411742v1 | {'id': 'http://arxiv.org/abs/cond-mat/0411742v... |
3 | http://arxiv.org/abs/1403.4790v1 | 2014-03-19 13:03:49+00:00 | 2014-03-19 13:03:49+00:00 | Group-velocity slowdown in quantum-dots and qu... | [Stephan Michael, Weng W. Chow, Hans Christian... | We investigate theoretically the slowdown of o... | Physics and Simulation of Optoelectronic Devic... | None | 10.1117/12.2042412 | cond-mat.mes-hall | [cond-mat.mes-hall, cond-mat.mtrl-sci] | [http://dx.doi.org/10.1117/12.2042412, http://... | http://arxiv.org/pdf/1403.4790v1 | {'id': 'http://arxiv.org/abs/1403.4790v1', 'gu... |
4 | http://arxiv.org/abs/cond-mat/0403328v1 | 2004-03-12 18:28:06+00:00 | 2004-03-12 18:28:06+00:00 | A new method to epitaxially grow long-range or... | [J. Bauer, D. Schuh, E. Uccelli, R. Schulz, A.... | We report on a new approach for positioning of... | None | None | None | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://arxiv.org/abs/cond-mat/0403328v1, http... | http://arxiv.org/pdf/cond-mat/0403328v1 | {'id': 'http://arxiv.org/abs/cond-mat/0403328v... |
5 | http://arxiv.org/abs/cond-mat/0411484v1 | 2004-11-18 16:47:14+00:00 | 2004-11-18 16:47:14+00:00 | Giant optical anisotropy in a single InAs quan... | [I. Favero, Guillaume Cassabois, A. Jankovic, ... | We present the experimental evidence of giant ... | submitted to Applied Physics Letters | None | 10.1063/1.1854733 | cond-mat.other | [cond-mat.other] | [http://dx.doi.org/10.1063/1.1854733, http://a... | http://arxiv.org/pdf/cond-mat/0411484v1 | {'id': 'http://arxiv.org/abs/cond-mat/0411484v... |
6 | http://arxiv.org/abs/1003.2350v1 | 2010-03-11 15:52:09+00:00 | 2010-03-11 15:52:09+00:00 | Linewidth broadening of a quantum dot coupled ... | [Arka Majumdar, Andrei Faraon, Erik Kim, Dirk ... | We study the coupling between a photonic cryst... | 5 pages, 4 figures | None | 10.1103/PhysRevB.82.045306 | quant-ph | [quant-ph] | [http://dx.doi.org/10.1103/PhysRevB.82.045306,... | http://arxiv.org/pdf/1003.2350v1 | {'id': 'http://arxiv.org/abs/1003.2350v1', 'gu... |
7 | http://arxiv.org/abs/1201.1258v1 | 2012-01-05 18:56:21+00:00 | 2012-01-05 18:56:21+00:00 | Photoluminescence from In0.5Ga0.5As/GaP quantu... | [Kelley Rivoire, Sonia Buckley, Yuncheng Song,... | We demonstrate room temperature visible wavele... | None | None | 10.1103/PhysRevB.85.045319 | quant-ph | [quant-ph, physics.optics] | [http://dx.doi.org/10.1103/PhysRevB.85.045319,... | http://arxiv.org/pdf/1201.1258v1 | {'id': 'http://arxiv.org/abs/1201.1258v1', 'gu... |
8 | http://arxiv.org/abs/1206.2674v1 | 2012-06-12 21:00:22+00:00 | 2012-06-12 21:00:22+00:00 | Effective microscopic theory of quantum dot su... | [U. Aeberhard] | We introduce a quantum dot orbital tight-bindi... | 9 pages, 6 figures; Special Issue: Numerical S... | Optical and Quantum Electronics 44, 133 (2012) | 10.1007/s11082-011-9529-9 | cond-mat.mes-hall | [cond-mat.mes-hall, cond-mat.mtrl-sci] | [http://dx.doi.org/10.1007/s11082-011-9529-9, ... | http://arxiv.org/pdf/1206.2674v1 | {'id': 'http://arxiv.org/abs/1206.2674v1', 'gu... |
9 | http://arxiv.org/abs/1405.1981v1 | 2014-05-08 15:51:52+00:00 | 2014-05-08 15:51:52+00:00 | A single quantum dot as an optical thermometer... | [Florian Haupt, Atac Imamoglu, Martin Kroner] | Resonant laser spectroscopy of a negatively ch... | 11 pages, 4 figures | Phys. Rev. Applied 2, 024001 (2014) | 10.1103/PhysRevApplied.2.024001 | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://dx.doi.org/10.1103/PhysRevApplied.2.02... | http://arxiv.org/pdf/1405.1981v1 | {'id': 'http://arxiv.org/abs/1405.1981v1', 'gu... |
Next, we’ll create list of all of the columns in the dataframe to see what else is there:
list(qd_df)
['entry_id',
'updated',
'published',
'title',
'authors',
'summary',
'comment',
'journal_ref',
'doi',
'primary_category',
'categories',
'links',
'pdf_url',
'_raw']
We have 14 columns overall. We’ll add two derived columns––the name of the first listed author and a reference to the original arxiv.Result
object-–then narrow the dataframe to paper titles, published
dates, and first authors to run some analysis of publishing patterns over time.
# Add a first_author column: the name of the first author among each paper's list of authors.
qd_df['first_author'] = [authors_list[0].name for authors_list in qd_df['authors']]
# Keep a reference to the original results in the dataframe: this is useful for downloading PDFs.
qd_df['_result'] = quantum_dots
# Narrow our dataframe to just the columns we want for our analysis.
qd_df = qd_df[['title', 'published', 'first_author', '_result']]
qd_df
title | published | first_author | _result | |
---|---|---|---|---|
0 | Excitonic properties of strained wurtzite and ... | 2003-10-15 20:15:59+00:00 | Vladimir A. Fonoberov | http://arxiv.org/abs/cond-mat/0310363v1 |
1 | A two-dimensional array of single-hole quantum... | 2020-08-26 16:48:21+00:00 | F. van Riggelen | http://arxiv.org/abs/2008.11666v1 |
2 | Polar optical phonons in wurtzite spheroidal q... | 2004-11-30 02:56:15+00:00 | Vladimir A. Fonoberov | http://arxiv.org/abs/cond-mat/0411742v1 |
3 | Group-velocity slowdown in quantum-dots and qu... | 2014-03-19 13:03:49+00:00 | Stephan Michael | http://arxiv.org/abs/1403.4790v1 |
4 | A new method to epitaxially grow long-range or... | 2004-03-12 18:28:06+00:00 | J. Bauer | http://arxiv.org/abs/cond-mat/0403328v1 |
... | ... | ... | ... | ... |
10761 | Integrated photonics enables continuous-beam e... | 2021-05-08 16:17:01+00:00 | J. -W. Henke | http://arxiv.org/abs/2105.03729v2 |
10762 | Asymptotic entanglement sudden death in two at... | 2021-05-12 14:29:44+00:00 | Gehad Sadiek | http://arxiv.org/abs/2105.05694v1 |
10763 | Engineering interfacial quantum states and ele... | 2021-07-21 15:19:10+00:00 | Ignacio Piquero-Zulaica | http://arxiv.org/abs/2107.10141v1 |
10764 | Implementation of the SMART protocol for globa... | 2021-08-02 12:46:49+00:00 | Ingvild Hansen | http://arxiv.org/abs/2108.00836v3 |
10765 | Indistinguishable single photons from spatiall... | 2021-08-03 12:00:47+00:00 | Jiefei Zhang | http://arxiv.org/abs/2108.01428v2 |
10766 rows × 4 columns
Visualize your results#
Get a sense of the how your topic has trended over time. When did research on your topic take off? Create a bar chart of the number of articles published in each year.
qd_df["published"].groupby(qd_df["published"].dt.year).count().plot(kind="bar")
<matplotlib.axes._subplots.AxesSubplot at 0x12120d790>

Explore authors to see who is publishing your topic. Group by author, then sort and select the top 20 authors.
qd_authors = qd_df.groupby(qd_df["first_author"])["first_author"].count().sort_values(ascending=False)
qd_authors.head(20)
first_author
Bing Dong 27
Y. Alhassid 20
Constantine Yannouleas 18
David M. -T. Kuo 16
Akira Oguri 15
Xuedong Hu 15
Kicheon Kang 14
B. Szafran 14
Rafael Sánchez 14
Massimo Rontani 14
Ulrich Hohenester 14
C. W. J. Beenakker 13
P. W. Brouwer 13
O. Entin-Wohlman 12
G. Giavaras 12
Vidar Gudmundsson 12
Piotr Trocha 12
Arka Majumdar 11
A. A. Aligia 11
Ramin M. Abolfath 11
Name: first_author, dtype: int64
Identify and download papers#
Let’s download the oldest paper about quantum dots co-authored by Piotr Trocha:
qd_Trocha_sorted = qd_df[qd_df['first_author']=='Piotr Trocha'].sort_values('published')
qd_Trocha_sorted
title | published | first_author | _result | |
---|---|---|---|---|
848 | Dicke-like effect in spin-polarized transport ... | 2007-11-22 16:11:11+00:00 | Piotr Trocha | http://arxiv.org/abs/0711.3611v2 |
2270 | Kondo-Dicke resonances in electronic transport... | 2008-03-28 15:49:07+00:00 | Piotr Trocha | http://arxiv.org/abs/0803.4154v1 |
3315 | Negative tunnel magnetoresistance and differen... | 2009-11-02 11:45:03+00:00 | Piotr Trocha | http://arxiv.org/abs/0911.0291v1 |
5772 | Beating in electronic transport through quantu... | 2010-04-11 16:20:04+00:00 | Piotr Trocha | http://arxiv.org/abs/1004.1819v2 |
2391 | Orbital Kondo effect in double quantum dots | 2010-08-17 14:13:23+00:00 | Piotr Trocha | http://arxiv.org/abs/1008.2902v2 |
2428 | The influence of spin-flip scattering on the p... | 2011-05-08 20:12:41+00:00 | Piotr Trocha | http://arxiv.org/abs/1105.1550v1 |
5402 | Large enhancement of thermoelectric effects in... | 2011-08-11 14:49:51+00:00 | Piotr Trocha | http://arxiv.org/abs/1108.2422v2 |
7661 | The role of the indirect tunneling processes a... | 2011-09-12 20:51:49+00:00 | Piotr Trocha | http://arxiv.org/abs/1109.2621v1 |
2613 | Spin-polarized Andreev transport influenced by... | 2014-09-14 23:54:35+00:00 | Piotr Trocha | http://arxiv.org/abs/1409.4122v1 |
7789 | Spin-resolved Andreev transport through double... | 2015-08-24 19:02:49+00:00 | Piotr Trocha | http://arxiv.org/abs/1508.05915v1 |
6015 | Spin-dependent thermoelectric phenomena in a q... | 2017-05-02 14:55:38+00:00 | Piotr Trocha | http://arxiv.org/abs/1705.01007v1 |
6436 | Cross-correlations in a quantum dot Cooper pai... | 2018-07-23 22:28:01+00:00 | Piotr Trocha | http://arxiv.org/abs/1807.08850v1 |
# Use the arxiv.Result object stored in the _result column to trigger a PDF download.
qd_Trocha_oldest = qd_Trocha_sorted.iloc[0]
qd_Trocha_oldest._result.download_pdf()
Confirm that the PDF has downloaded!
Bibliography#
Tim Head: https://betatim.github.io/posts/analysing-the-arxiv/
Lukas Schwab: lukasschwab/arxiv.py
ArXiV API user manual: https://arxiv.org/help/api/user-manual