Linker API
The Sefaria Linker relies on a POST API, documented below.
Introduction
Takes in text input and returns the location and a Sefaria link for each citation in the text. Below is a table detailing current support by language.
Language | Support |
---|---|
English | Basic support. Relies on a regular expression algorithm that can cause a high level of false positives and false negatives. Works better on simple sources like Tanakh, Mishnah, Talmud etc. Works more poorly on books with complicated titles or citations. |
Hebrew | Relies on machine learning models. Aim is to limit false positives. Supports ibid citations. Will return multiple options if a citation is ambiguous. |
API
POST /api/find-refs
This endpoint takes text as input and returns the location and a Sefaria link for each citation.
URL parameters
URL param | Description | Type | Default |
---|---|---|---|
with_text | Return the text for each citation. See with_text format section for details. | 0 or 1 | 0 |
debug | Return debug information for each citation. See debug format section for details. | 0 or 1 | 0 |
max_segments | When with_text is 1 , what is the max number of segments to return for a citation. Limits size of response for general citations like פרשת בראשית | int. 0 means no limit. | 0 |
POST body
POST body should be a serialized JSON with the following fields
Field | Description | Required |
---|---|---|
text | Text to search for citations in. See text format section for details. | Yes |
Response format
Response is in JSON in the following format. See example.
{
"title": {
// data corresponding to results in the "title" field of the POST data
"results": [
"startChar": <int>, // start character of result relative to input text.
"endChar": <int>, // end character of result relative to input text.
"text": <str>, // text of the citation for this result. corresponds to data.title.substring(startChar, endChar)
"linkFailed": <bool>, // true if reference was located but failed to be linked to a source on Sefaria. Could mean an error in the linker or the source doesn't exist on Sefaria.
"refs": [<str>], // list of references in Sefaria format that this result could be linked to. If more than one reference, then the citation was considered to be ambiguous.
],
"refData": {
// object with keys for every ref in "results".
<ref>: {
heRef: <str>, // Hebrew translation of reference
url: <str>, // reference in URL form. Create a URL to Sefaria with this format: www.sefaria.org/<url>
primaryCategory: <str> // primary category of reference in Sefaria. Corresponds to location in table of contents ([sefaria.org/texts](sefaria.org/texts).
he: <str>, // Hebrew text of <ref>. Only included if `with_text` URL param is `1`.
en: <str>, // English text of <ref>. Only included if `with_text` URL param is `1`.
isTruncated: <bool> // Was text truncated. Only included if `with_text` URL param is `1` and `max_segments` URL param is greater than 0.
}
},
"debugData": [], // array of debug data where each element corresponds to an element in "results". only included if "debug" URL param is `1`. see "debug format" section.
},
"body": {
// data corresponding to results in the "body" field of the POST data
// same format as "title" object
}
}
with_text format
When with_text
URL param is 1
, the following keys are added to the response object at response.title.refData
and response.body.refData
.
Field | Description |
---|---|
he | Hebrew text of where <ref> is the key of the refData element. |
en | English text of where <ref> is the key of the refData element. |
isTruncated | Was text truncated according to max_segments URL param. |
debug format
When the debug
URL param is 1
, the debugData
field is added to the response object at response.title
and response.body
. The debugData
field has the same shape as response.title.results
and response.body.results
(2-D array where outer array is for each citation found and inner arrays are for each possible way of parsing a given citation). Below is an example of a single element in debugData
with comments explaining each field:
{
"orig_part_strs":[ // original input string broken into parts
"איוב",
"פרק יז"
],
"orig_part_types":[ // associated types for each part in `orig_part_strs`
"NAMED",
"NUMBERED"
],
"final_part_strs":[ // final part strings after processing (often times this is the same as `orig_part_strs` but can differ if in some cases)
"איוב",
"פרק יז"
],
"final_part_types":[ // associated types for each part in `final_part_strs`
"NAMED",
"NUMBERED"
],
"resolved_part_strs":[ // actual parts that were resolved. This is important because sometimes the parts are merged during resolution
"איוב",
"פרק יז"
],
"resolved_part_types":[ // associated types for each part in `resolved_part_strs`
"NAMED",
"NUMBERED"
],
"resolved_part_classes":[ // associated classes for each part in `resolved_part_strs`. This can help debug cases where context and ibid were used.
"RawRefPart",
"RawRefPart"
],
"context_ref":null, // If a context ref was used to resolve the citation (e.g. in the case of an ibid citation), it will appear here.
"context_type":null // associated type of context for `context_ref`.
}
text format
The POST data field text
should be formatted as follows:
{
"title": <str>, // text of the title of the document. May be empty string when there is not title. Any citations found in the title will be assumed as context for citations found in the "body". This is useful for articles that focus on one Perek of Tanakh for example.
"body": <str> // text of the body of the document
}
Example
Below is an example in cURL which uses all the URL parameters. Note, none of the URL parameters are required. The example shows how a citation found in the "title" field can be used as context for citations in the "body" field.
Input
curl -X POST 'https://www.sefaria.org/api/find-refs?debug=1&with_text=1&max_segments=5' --data-raw '{"text":{"body": "ראה מה שכתוב בפסוק א.", "title": "עיון על איוב פרק יז"}}'
Output
{
"title":{
"results":[
{
"startChar":8,
"endChar":19,
"text":"איוב פרק יז",
"linkFailed":false,
"refs":[
"Job 17"
]
}
],
"refData":{
"Job 17":{
"heRef":"איוב י״ז",
"url":"Job.17",
"primaryCategory":"Tanakh",
"he":[
"רוּחִ֣י חֻ֭בָּלָה יָמַ֥י נִזְעָ֗כוּ קְבָרִ֥ים לִֽי׃",
"אִם־לֹ֣א הֲ֭תֻלִים עִמָּדִ֑י וּ֝בְהַמְּרוֹתָ֗ם תָּלַ֥ן עֵינִֽי׃",
"שִֽׂימָה־נָּ֭א עׇרְבֵ֣נִי עִמָּ֑ךְ מִ֥י ה֝֗וּא לְיָדִ֥י יִתָּקֵֽעַ׃",
"כִּֽי־לִ֭בָּם צָפַ֣נְתָּ מִּשָּׂ֑כֶל עַל־כֵּ֝֗ן לֹ֣א תְרֹמֵֽם׃",
"לְ֭חֵלֶק יַגִּ֣יד רֵעִ֑ים וְעֵינֵ֖י בָנָ֣יו תִּכְלֶֽנָה׃"
],
"en":[
"My spirit is crushed, my days run out;<br/>The graveyard waits for me.<br/>",
"Surely mocking men keep me company,<br/>And with their provocations I close my eyes.",
"Come now, stand surety for me!<br/>Who will give his hand on my behalf?",
"You have hidden understanding from their minds;<br/>Therefore You must not exalt [them].",
"He informs on his friends for a share [of their property],<br/>And his children’s eyes pine away.<br/>"
],
"isTruncated":true
}
},
"debugData":[
[
{
"orig_part_strs":[
"איוב",
"פרק יז"
],
"orig_part_types":[
"NAMED",
"NUMBERED"
],
"final_part_strs":[
"איוב",
"פרק יז"
],
"final_part_types":[
"NAMED",
"NUMBERED"
],
"resolved_part_strs":[
"איוב",
"פרק יז"
],
"resolved_part_types":[
"NAMED",
"NUMBERED"
],
"resolved_part_classes":[
"RawRefPart",
"RawRefPart"
],
"context_ref":null,
"context_type":null
}
]
]
},
"body":{
"results":[
{
"startChar":13,
"endChar":20,
"text":"בפסוק א",
"linkFailed":false,
"refs":[
"Job 17:1"
]
}
],
"refData":{
"Job 17:1":{
"heRef":"איוב י״ז:א׳",
"url":"Job.17.1",
"primaryCategory":"Tanakh",
"he":[
"רוּחִ֣י חֻ֭בָּלָה יָמַ֥י נִזְעָ֗כוּ קְבָרִ֥ים לִֽי׃"
],
"en":[
"My spirit is crushed, my days run out;<br/>The graveyard waits for me.<br/>"
],
"isTruncated":false
}
},
"debugData":[
[
{
"orig_part_strs":[
"בפסוק א"
],
"orig_part_types":[
"NUMBERED"
],
"final_part_strs":[
"בפסוק א"
],
"final_part_types":[
"NUMBERED"
],
"resolved_part_strs":[
"job",
"SectionContext(AddressPerek(0), 'Chapter', 17)",
"בפסוק א"
],
"resolved_part_types":[
"NAMED",
"NUMBERED",
"NUMBERED"
],
"resolved_part_classes":[
"TermContext",
"SectionContext",
"RawRefPart"
],
"context_ref":"Job 17",
"context_type":"CURRENT_BOOK"
}
]
]
}
}
Debug using local webpage
In order to easily debug why certain citations aren't working, it is useful to use the linker.js plugin which has certain debug tools built-in. Below is a skeleton HTML page that can be used to test content against the linker. This webpage has linker.js embedded in debug mode which will show all citations caught, including one's that weren't linked. See here for more information on how the debug
option works.
<html>
<body>
<h1>TITLE HERE</h1>
<p>BODY HERE</p>
</body>
<script type="text/javascript" charset="utf-8" src="https://www.sefaria.org/linker.v3.js"></script>
<script>sefaria.link({debug: true});</script>
</html>
Updated 5 months ago