ArXiv API#
The ArXiV API allows programmatic access to the arXiv’s e-print content and metadata. “The goal of the interface is to facilitate new and creative use of the the vast body of material on the arXiv by providing a low barrier to entry for application developers.” https://arxiv.org/help/api
The API’s user manual (https://arxiv.org/help/api/user-manual) provides helpful documentation for using the API and retrieving article metadata.
Our examples below will introduce you to the basics of querying the ArXiV API.
Install Packages#
import urllib
import arxiv
import requests
import json
import csv
import pandas as pd
from collections import Counter, defaultdict
import numpy as np # for array manipulation
import matplotlib.pyplot as plt # for data visualization
%matplotlib inline
import datetime
Query the API#
Perform a simple query for “graphene.” We’ll limit results to the titles of the 10 most recent papers.
search = arxiv.Search(
query = "graphene",
max_results = 10,
sort_by = arxiv.SortCriterion.SubmittedDate
)
for result in search.results():
print(result.title)
/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/ipykernel_launcher.py:7: DeprecationWarning: The 'Search.results' method is deprecated, use 'Client.results' instead
import sys
Electrical Spectroscopy of Polaritonic Nanoresonators
Quantum nuclear motion in silicene: Assessing structural and vibrational properties through path-integral simulations
Classification of mass terms in kagome semimetals
Spin Seebeck Effect in Graphene
Kekulé distortions in graphene on cadmium sulfide
Topological chiral superconductivity
High- and low-energy many-body effects of graphene in a unified approach
Coupled Spin and Valley Hall Effects Driven by Coherent Tunneling
Multichannel joint-polarization-frequency-modulation encrypted metasurface in secure THz communication
Detection of terahertz radiation using topological graphene micro-nanoribbon structures with transverse plasmonic resonant cavities
Do another query for the topic “quantum dots,” but note that you could swap in a topic of your liking.
You can define a custom arXiv API client with specialized pagination behavior. This time we’ll process each paper as it’s fetched rather than exhausting the result-generator into a list
; this is useful for running analysis while the client sleeps.
Because this arxiv.Search
doesn’t bound the number of results with max_results
, it will fetch every matching paper (roughly 10,000). This may take several minutes.
results_generator = arxiv.Client(
page_size=1000,
delay_seconds=3,
num_retries=3
).results(arxiv.Search(
query='"quantum dots"',
id_list=[],
sort_by=arxiv.SortCriterion.Relevance,
sort_order=arxiv.SortOrder.Descending,
))
quantum_dots = []
for paper in results_generator:
# You could do per-paper analysis here; for now, just collect them in a list.
quantum_dots.append(paper)
Organize and analyze your results#
Create a dataframe to better analyze your results. This example uses Python’s vars
built-in function to convert search results into Python dictionaries of paper metadata.
qd_df = pd.DataFrame([vars(paper) for paper in quantum_dots])
We’ll look at the first 10 results.
qd_df.head(10)
entry_id | updated | published | title | authors | summary | comment | journal_ref | doi | primary_category | categories | links | pdf_url | _raw | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | http://arxiv.org/abs/cond-mat/0310363v1 | 2003-10-15 20:15:59+00:00 | 2003-10-15 20:15:59+00:00 | Excitonic properties of strained wurtzite and ... | [Vladimir A. Fonoberov, Alexander A. Balandin] | We investigate exciton states theoretically in... | 18 pages, accepted for publication in the Jour... | J. Appl. Phys. 94, 7178 (2003) | 10.1063/1.1623330 | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://dx.doi.org/10.1063/1.1623330, http://a... | http://arxiv.org/pdf/cond-mat/0310363v1 | {'id': 'http://arxiv.org/abs/cond-mat/0310363v... |
1 | http://arxiv.org/abs/2008.11666v1 | 2020-08-26 16:48:21+00:00 | 2020-08-26 16:48:21+00:00 | A two-dimensional array of single-hole quantum... | [F. van Riggelen, N. W. Hendrickx, W. I. L. La... | Quantum dots fabricated using techniques and m... | 7 pages, 4 figures | None | 10.1063/5.0037330 | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://dx.doi.org/10.1063/5.0037330, http://a... | http://arxiv.org/pdf/2008.11666v1 | {'id': 'http://arxiv.org/abs/2008.11666v1', 'g... |
2 | http://arxiv.org/abs/cond-mat/0411742v1 | 2004-11-30 02:56:15+00:00 | 2004-11-30 02:56:15+00:00 | Polar optical phonons in wurtzite spheroidal q... | [Vladimir A. Fonoberov, Alexander A. Balandin] | Polar optical-phonon modes are derived analyti... | 11 pages | J. Phys.: Condens. Matter 17, 1085 (2005) | 10.1088/0953-8984/17/7/003 | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://dx.doi.org/10.1088/0953-8984/17/7/003,... | http://arxiv.org/pdf/cond-mat/0411742v1 | {'id': 'http://arxiv.org/abs/cond-mat/0411742v... |
3 | http://arxiv.org/abs/1403.4790v1 | 2014-03-19 13:03:49+00:00 | 2014-03-19 13:03:49+00:00 | Group-velocity slowdown in quantum-dots and qu... | [Stephan Michael, Weng W. Chow, Hans Christian... | We investigate theoretically the slowdown of o... | Physics and Simulation of Optoelectronic Devic... | None | 10.1117/12.2042412 | cond-mat.mes-hall | [cond-mat.mes-hall, cond-mat.mtrl-sci] | [http://dx.doi.org/10.1117/12.2042412, http://... | http://arxiv.org/pdf/1403.4790v1 | {'id': 'http://arxiv.org/abs/1403.4790v1', 'gu... |
4 | http://arxiv.org/abs/cond-mat/0403328v1 | 2004-03-12 18:28:06+00:00 | 2004-03-12 18:28:06+00:00 | A new method to epitaxially grow long-range or... | [J. Bauer, D. Schuh, E. Uccelli, R. Schulz, A.... | We report on a new approach for positioning of... | None | None | None | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://arxiv.org/abs/cond-mat/0403328v1, http... | http://arxiv.org/pdf/cond-mat/0403328v1 | {'id': 'http://arxiv.org/abs/cond-mat/0403328v... |
5 | http://arxiv.org/abs/cond-mat/0411484v1 | 2004-11-18 16:47:14+00:00 | 2004-11-18 16:47:14+00:00 | Giant optical anisotropy in a single InAs quan... | [I. Favero, Guillaume Cassabois, A. Jankovic, ... | We present the experimental evidence of giant ... | submitted to Applied Physics Letters | None | 10.1063/1.1854733 | cond-mat.other | [cond-mat.other] | [http://dx.doi.org/10.1063/1.1854733, http://a... | http://arxiv.org/pdf/cond-mat/0411484v1 | {'id': 'http://arxiv.org/abs/cond-mat/0411484v... |
6 | http://arxiv.org/abs/1003.2350v1 | 2010-03-11 15:52:09+00:00 | 2010-03-11 15:52:09+00:00 | Linewidth broadening of a quantum dot coupled ... | [Arka Majumdar, Andrei Faraon, Erik Kim, Dirk ... | We study the coupling between a photonic cryst... | 5 pages, 4 figures | None | 10.1103/PhysRevB.82.045306 | quant-ph | [quant-ph] | [http://dx.doi.org/10.1103/PhysRevB.82.045306,... | http://arxiv.org/pdf/1003.2350v1 | {'id': 'http://arxiv.org/abs/1003.2350v1', 'gu... |
7 | http://arxiv.org/abs/1201.1258v1 | 2012-01-05 18:56:21+00:00 | 2012-01-05 18:56:21+00:00 | Photoluminescence from In0.5Ga0.5As/GaP quantu... | [Kelley Rivoire, Sonia Buckley, Yuncheng Song,... | We demonstrate room temperature visible wavele... | None | None | 10.1103/PhysRevB.85.045319 | quant-ph | [quant-ph, physics.optics] | [http://dx.doi.org/10.1103/PhysRevB.85.045319,... | http://arxiv.org/pdf/1201.1258v1 | {'id': 'http://arxiv.org/abs/1201.1258v1', 'gu... |
8 | http://arxiv.org/abs/1206.2674v1 | 2012-06-12 21:00:22+00:00 | 2012-06-12 21:00:22+00:00 | Effective microscopic theory of quantum dot su... | [U. Aeberhard] | We introduce a quantum dot orbital tight-bindi... | 9 pages, 6 figures; Special Issue: Numerical S... | Optical and Quantum Electronics 44, 133 (2012) | 10.1007/s11082-011-9529-9 | cond-mat.mes-hall | [cond-mat.mes-hall, cond-mat.mtrl-sci] | [http://dx.doi.org/10.1007/s11082-011-9529-9, ... | http://arxiv.org/pdf/1206.2674v1 | {'id': 'http://arxiv.org/abs/1206.2674v1', 'gu... |
9 | http://arxiv.org/abs/1405.1981v1 | 2014-05-08 15:51:52+00:00 | 2014-05-08 15:51:52+00:00 | A single quantum dot as an optical thermometer... | [Florian Haupt, Atac Imamoglu, Martin Kroner] | Resonant laser spectroscopy of a negatively ch... | 11 pages, 4 figures | Phys. Rev. Applied 2, 024001 (2014) | 10.1103/PhysRevApplied.2.024001 | cond-mat.mes-hall | [cond-mat.mes-hall] | [http://dx.doi.org/10.1103/PhysRevApplied.2.02... | http://arxiv.org/pdf/1405.1981v1 | {'id': 'http://arxiv.org/abs/1405.1981v1', 'gu... |
Next, we’ll create list of all of the columns in the dataframe to see what else is there:
list(qd_df)
['entry_id',
'updated',
'published',
'title',
'authors',
'summary',
'comment',
'journal_ref',
'doi',
'primary_category',
'categories',
'links',
'pdf_url',
'_raw']
We have 14 columns overall. We’ll add two derived columns––the name of the first listed author and a reference to the original arxiv.Result
object-–then narrow the dataframe to paper titles, published
dates, and first authors to run some analysis of publishing patterns over time.
# Add a first_author column: the name of the first author among each paper's list of authors.
qd_df['first_author'] = [authors_list[0].name for authors_list in qd_df['authors']]
# Keep a reference to the original results in the dataframe: this is useful for downloading PDFs.
qd_df['_result'] = quantum_dots
# Narrow our dataframe to just the columns we want for our analysis.
qd_df = qd_df[['title', 'published', 'first_author', '_result']]
qd_df
title | published | first_author | _result | |
---|---|---|---|---|
0 | Excitonic properties of strained wurtzite and ... | 2003-10-15 20:15:59+00:00 | Vladimir A. Fonoberov | http://arxiv.org/abs/cond-mat/0310363v1 |
1 | A two-dimensional array of single-hole quantum... | 2020-08-26 16:48:21+00:00 | F. van Riggelen | http://arxiv.org/abs/2008.11666v1 |
2 | Polar optical phonons in wurtzite spheroidal q... | 2004-11-30 02:56:15+00:00 | Vladimir A. Fonoberov | http://arxiv.org/abs/cond-mat/0411742v1 |
3 | Group-velocity slowdown in quantum-dots and qu... | 2014-03-19 13:03:49+00:00 | Stephan Michael | http://arxiv.org/abs/1403.4790v1 |
4 | A new method to epitaxially grow long-range or... | 2004-03-12 18:28:06+00:00 | J. Bauer | http://arxiv.org/abs/cond-mat/0403328v1 |
... | ... | ... | ... | ... |
12426 | Electric Field Control of Molecular Charge Sta... | 2024-05-29 08:10:14+00:00 | Dhaneesh Kumar | http://arxiv.org/abs/2405.18855v1 |
12427 | Experimental single-photon quantum key distrib... | 2024-06-04 07:28:15+00:00 | Yang Zhang | http://arxiv.org/abs/2406.02045v1 |
12428 | Assessment of error variation in high-fidelity... | 2023-03-07 17:50:32+00:00 | Tuomo Tanttu | http://arxiv.org/abs/2303.04090v3 |
12429 | Functional light diffusers based on hybrid CsP... | 2023-07-12 14:36:29+00:00 | Lena M. Saure | http://arxiv.org/abs/2307.06197v1 |
12430 | Interferometric Single-Shot Parity Measurement... | 2024-01-17 19:05:29+00:00 | Morteza Aghaee | http://arxiv.org/abs/2401.09549v4 |
12431 rows × 4 columns
Visualize your results#
Get a sense of the how your topic has trended over time. When did research on your topic take off? Create a bar chart of the number of articles published in each year.
qd_df["published"].groupby(qd_df["published"].dt.year).count().plot(kind="bar")
<AxesSubplot:xlabel='published'>
Explore authors to see who is publishing your topic. Group by author, then sort and select the top 20 authors.
qd_authors = qd_df.groupby(qd_df["first_author"])["first_author"].count().sort_values(ascending=False)
qd_authors.head(20)
first_author
Bing Dong 27
Constantine Yannouleas 23
Y. Alhassid 20
Vidar Gudmundsson 17
Akira Oguri 16
David M. -T. Kuo 16
B. Szafran 16
Xuedong Hu 15
Rafael Sánchez 14
Kicheon Kang 14
Massimo Rontani 14
Ulrich Hohenester 14
P. W. Brouwer 13
C. W. J. Beenakker 13
David M T Kuo 13
G. Giavaras 12
Rui Li 12
Tetsufumi Tanamoto 12
O. Entin-Wohlman 12
Piotr Trocha 12
Name: first_author, dtype: int64
Identify and download papers#
Let’s download the oldest paper about quantum dots co-authored by Piotr Trocha:
qd_Trocha_sorted = qd_df[qd_df['first_author']=='Piotr Trocha'].sort_values('published')
qd_Trocha_sorted
title | published | first_author | _result | |
---|---|---|---|---|
820 | Dicke-like effect in spin-polarized transport ... | 2007-11-22 16:11:11+00:00 | Piotr Trocha | http://arxiv.org/abs/0711.3611v2 |
2507 | Kondo-Dicke resonances in electronic transport... | 2008-03-28 15:49:07+00:00 | Piotr Trocha | http://arxiv.org/abs/0803.4154v1 |
3642 | Negative tunnel magnetoresistance and differen... | 2009-11-02 11:45:03+00:00 | Piotr Trocha | http://arxiv.org/abs/0911.0291v1 |
5845 | Beating in electronic transport through quantu... | 2010-04-11 16:20:04+00:00 | Piotr Trocha | http://arxiv.org/abs/1004.1819v2 |
2664 | Orbital Kondo effect in double quantum dots | 2010-08-17 14:13:23+00:00 | Piotr Trocha | http://arxiv.org/abs/1008.2902v2 |
2712 | The influence of spin-flip scattering on the p... | 2011-05-08 20:12:41+00:00 | Piotr Trocha | http://arxiv.org/abs/1105.1550v1 |
6016 | Large enhancement of thermoelectric effects in... | 2011-08-11 14:49:51+00:00 | Piotr Trocha | http://arxiv.org/abs/1108.2422v2 |
8545 | The role of the indirect tunneling processes a... | 2011-09-12 20:51:49+00:00 | Piotr Trocha | http://arxiv.org/abs/1109.2621v1 |
2858 | Spin-polarized Andreev transport influenced by... | 2014-09-14 23:54:35+00:00 | Piotr Trocha | http://arxiv.org/abs/1409.4122v1 |
8632 | Spin-resolved Andreev transport through double... | 2015-08-24 19:02:49+00:00 | Piotr Trocha | http://arxiv.org/abs/1508.05915v1 |
6741 | Spin-dependent thermoelectric phenomena in a q... | 2017-05-02 14:55:38+00:00 | Piotr Trocha | http://arxiv.org/abs/1705.01007v1 |
6872 | Cross-correlations in a quantum dot Cooper pai... | 2018-07-23 22:28:01+00:00 | Piotr Trocha | http://arxiv.org/abs/1807.08850v1 |
# Use the arxiv.Result object stored in the _result column to trigger a PDF download.
qd_Trocha_oldest = qd_Trocha_sorted.iloc[0]
qd_Trocha_oldest._result.download_pdf()
'./0711.3611v2.Dicke_like_effect_in_spin_polarized_transport_through_coupled_quantum_dots.pdf'
Confirm that the PDF has downloaded!
Bibliography#
Tim Head: https://betatim.github.io/posts/analysing-the-arxiv/
Lukas Schwab: lukasschwab/arxiv.py
ArXiV API user manual: https://arxiv.org/help/api/user-manual