Migrating Source Sheets via the Sefaria API
Sefaria Source Sheets are stored in a simple, home-grown JSON format which includes some limited uses of HTML. For any source sheet you can access on Sefaria, you can see the JSON which underlies it by adding /api/
to the root of its URL path, e.g. http://www.sefaria.org/api/sheets/6.
Overview
Source sheets include a number of top level fields setting the title, publication status and overall viewing options. The bulk of the sheet is stored in an array of sources objects which may be one of 4 types (Source, Outside Text, Media, Comment). Each source may further have option individual display options.
Any textual content which a user may edit supports a subset of HTML tags which include: a
, b
, i
, u
, em
, strong
, small
, p
, br
, div
, span
and img
.
At a minimum sheets must contain a title
a status
and sheet level options
Top Level Fields
title
- HTML of the sheet title. HTML is allowed to format the sheet header, but in UI contexts on the site the title is shown stripped of HTML.status
- eitherunlisted
orpublic
, whether or not the sheet is listed among public sheets for all Sefaria users.options
- object with fields describing general options for how the sheet is viewed. While a user is viewing a sheet, they can change the value of each of their fields to (temporarily) adjust how they see the sheet. Theoptions
fields are:numbered
- boolean whether to show a source number before each source.boxed
- boolean whether to display a box around each source.bsd
- whether to showבס"ד
at the top of the sheet.language
- what language the content of the sheet should appear in, eitherenglish
,hebrew
orbilingual
.layout
- for bilingual sheets, how languages should lay out, eitherstacked
(Hebrew on top of English) orsideBySide
.langLayout
- for side by side bilingual sheets, which language should appear on which side, eitherheLeft
(Hebrew on the left) orheRight
(Hebrew on the right).divineNames
- what (if any) substitution scheme to use for the four letter name of G-d. EithernoSub
(no substitution),yy
(substituteיי
),ykvk
(substituteיקוק
) orh
(substituteה'
).collaboration
- determines who is allowed to edit or add to the sheet. Adding means being able to add new sources to a sheet, but not edit existing sources. Options are:none
(only the owner can edit),anyone-can-add
,anyone-can-edit
,group-can-add
,group-can-edit
.
id
- a unique integer ID. To create a new sheet, post a document without anid
. The server will assign anid
and return it the response. To edit an existing sheet, include itsid
field.tags
- an array of strings each representing a tag which applies to the sheet.group
- a string naming the Sefaria group to which a sheet belongs. Only members of a given group may set this field to that group name.attribution
- [optional] HTML that stands in place of the authorship line of the sheet. Without this field, sheets include the line "Source Sheet by X" where X is the name and a link to the profile of the sheet owner.promptedToPublish
- an ISO timestamp representing the date when a user was prompted to publish the sheet. For sheets created external to Sefaria, setting this field to any time in the past will prevent future prompts on non-public sheets.
Sources
The top level sources
field is an array of objects representing different kinds of items on the sheet. The type of each source is determined by its fields and the determines how the data is rendered. The options for source
objects are:
Source
Determined by the presence of ref
field. A source that comes from inside the Sefaria Library. By virtue being a part of the Sefaria library the source automatically gets features like being able to automatically look up and add connections, links out to the text in Sefaria and the ability to reset the text of the source to what exists in the Sefaria database.
ref
- a string reference (a.k.a citation) that defines the segment or ranges of segments in the text. See [[Text References]] for more. To determine if aref
is understood by Sefaria you can issue a query to the Texts API, if theref
is not understood it will generate an error.heRef
- a string, the Hebrew translation ofref
, shown above the Hebrew text.text
- an object consisting of two fields,en
andhe
which contain the HTML of the source text itself in English and Hebrew respectively.title
- [optional] a string that is shown as a custom title above the source.
Outside Text
Determined by the presence of either outsideText
or outsideBiText
field. This type supports including sources as free HTML which do not appear in the Sefaria library. It may be either a single language or have a bilingual content to mirrors a source. Single language outside texts can be used to insert any need free form HTML into a sheet.
outsideText
- HTML of the text for a single language case.outsideBiText
- an object consisting of the fieldsen
andhe
which contain HTML of the English and Hebrew of the text respectively.
Comment
Determined by the presence of the field comment
. Comments are free HTML like single language outside texts, but are rendered in the interface with a comment icon and may in the future link to more social features.
comment
- HTML of the comment text.
Media
Determined by the presence of the field media
. This type may be either an embedded image, MP3 or YouTube video.
media
- the URL of the media resource. How the media is rendered is determined by the URL's extension/domain:- Image: if the URL ends with
.jpg
,.jpeg
,.gif
or.png
it is rendered in an<img>
tag. - MP3: if the URL ends with
.mp3
or is hosted onclyp.it
, it is rendered in an<audio>
tag. - Video: if the URL is to a YouTube video, it is rendered as an
<iframe>
to YouTube.
- Image: if the URL ends with
Source Level Options
Each source object may optionally include an options
field with a object storing source level objects which override the default display of the source. These options are:
sourceLanguage
- what language to display the source content, eitherhebrew
,english
orbilingual
.sourceLayout
- for bilingual sources, whether the layout should bestacked
orsideBySide
.sourceLangLayout
- for side by side bilingual sources, on which side the Hebrew text should display. EitherheLeft
for Hebrew on the left, ofheRight
for Hebrew on the right.indented
- sets an indentation level for a source. Three levels of indentation are supported with the three valuesindented-1
,indented-2
andindented-3
.sourcePrefix
- a string that contains content that can be displayed as marginalia.PrependRefWithEn
- a string that contains content to prepend before the english reference of a source.PrependRefWithHe
- a string that contains content to prepend before the hebrew reference of a source.
Fields Set only by the Server
A number of fields are set only internal to Sefaria's database and cannot be set except in the process of saving a sheet within Sefaria. These include:
_id
- a unique Mongo ID.owner
- user ID of the sheet's owner.views
- integer number of times the sheet has been viewed.likes
- array of integer user IDs of users who like this sheet.dateCreated
- ISO timestamp when the document was first created.dateModified
- ISO timestamp when the document last changedlastModified
- ISO timestamp of the previous modification time, used to check if a sheet has been updated since the client last received its data.nextNode
- integer used to track the next node ID to assign to a new source. Node IDs are used only for real time collaborative editing.- [on a source]
node
- integer ID of the source used in real time collaborative editing.
Example Script
# -*- coding: utf-8 -*-
"""
Post sheets to Sefaria using an API Key.
"""
import sys
import json
import requests
sheets = [
{"title": "Source Sheet 1 Title" ...},
{"title": "Source Sheet 2 Title" ...},
...
]
for sheet in sheets:
sheet_json = {}
sheet_json["status"] = "public"
sheet_json["title"] = sheet["title"]
sheet_json["sources"] = []
sheet_json["options"] = {"numbered": 0,"assignable": 0,"layout": "sideBySide","boxed": 0,"language": "bilingual","divineNames": "noSub","collaboration": "none", "highlightMode": 0, "bsd": 0,"langLayout": "heRight"}
sheet_content = json.dumps(sheet_json)
values = {'json': sheet_content, 'apikey': 'API_KEY'} # Fill in API_KEY with your api key. To obtain an api key, contact [email protected]
try:
response = requests.post("https://www.sefaria.org/api/sheets", data=values)
print("Sheet posted.")
print(r.json())
except urllib2.HTTPError as e:
error_message = e.read()
print(error_message)
Updated 8 months ago