Syntax Analysis using Amazon Comprehend Syntax API with AWS SDK for Python (Boto3)

2019.02.25

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Amazon Comprehend announced support of Syntax Analysis. In this blog, let's perform Syntax Analysis using Amazon Comprehend Syntax API with AWS SDK for Python (Boto3).

Amazon Comprehend Now Supports Syntax Analysis

Environment

$ pip list | grep boto3
boto3 1.9.2

Sample Code

Execution result

{
"ResponseMetadata": {
"HTTPHeaders": {
"connection": "keep-alive",
"content-length": "2758",
"content-type": "application/x-amz-json-1.1",
"date": "Wed, 12 Sep 2018 16:35:03 GMT",
"x-amzn-requestid": "cc8b7643-b6a9-11e8-9f8f-71568a3ae70c"
},
"HTTPStatusCode": 200,
"RequestId": "cc8b7643-b6a9-11e8-9f8f-71568a3ae70c",
"RetryAttempts": 0
},
"SyntaxTokens": [
{
"BeginOffset": 0,
"EndOffset": 6,
"PartOfSpeech": {
"Score": 0.9970498085021973,
"Tag": "PROPN"
},
"Text": "Amazon",
"TokenId": 1
},
{
"BeginOffset": 7,
"EndOffset": 17,
"PartOfSpeech": {
"Score": 0.9976467490196228,
"Tag": "PROPN"
},
"Text": "Comprehend",
"TokenId": 2
},
{
"BeginOffset": 18,
"EndOffset": 20,
"PartOfSpeech": {
"Score": 0.9982584118843079,
"Tag": "VERB"
},
"Text": "is",
"TokenId": 3
},
{
"BeginOffset": 21,
"EndOffset": 22,
"PartOfSpeech": {
"Score": 0.9999969005584717,
"Tag": "DET"
},
"Text": "a",
"TokenId": 4
},
{
"BeginOffset": 23,
"EndOffset": 30,
"PartOfSpeech": {
"Score": 0.9993355870246887,
"Tag": "ADJ"
},
"Text": "natural",
"TokenId": 5
},
{
"BeginOffset": 31,
"EndOffset": 39,
"PartOfSpeech": {
"Score": 0.996455729007721,
"Tag": "NOUN"
},
"Text": "language",
"TokenId": 6
},
{
"BeginOffset": 40,
"EndOffset": 50,
"PartOfSpeech": {
"Score": 0.9889174699783325,
"Tag": "NOUN"
},
"Text": "processing",
"TokenId": 7
},
{
"BeginOffset": 51,
"EndOffset": 52,
"PartOfSpeech": {
"Score": 0.9999988079071045,
"Tag": "PUNCT"
},
"Text": "(",
"TokenId": 8
},
{
"BeginOffset": 52,
"EndOffset": 55,
"PartOfSpeech": {
"Score": 0.9151285290718079,
"Tag": "PROPN"
},
"Text": "NLP",
"TokenId": 9
},
{
"BeginOffset": 55,
"EndOffset": 56,
"PartOfSpeech": {
"Score": 0.9999597072601318,
"Tag": "PUNCT"
},
"Text": ")",
"TokenId": 10
},
{
"BeginOffset": 57,
"EndOffset": 64,
"PartOfSpeech": {
"Score": 0.9986529350280762,
"Tag": "NOUN"
},
"Text": "service",
"TokenId": 11
},
{
"BeginOffset": 65,
"EndOffset": 69,
"PartOfSpeech": {
"Score": 0.9936331510543823,
"Tag": "PRON"
},
"Text": "that",
"TokenId": 12
},
{
"BeginOffset": 70,
"EndOffset": 74,
"PartOfSpeech": {
"Score": 0.9999306201934814,
"Tag": "VERB"
},
"Text": "uses",
"TokenId": 13
},
{
"BeginOffset": 75,
"EndOffset": 82,
"PartOfSpeech": {
"Score": 0.9979239702224731,
"Tag": "NOUN"
},
"Text": "machine",
"TokenId": 14
},
{
"BeginOffset": 83,
"EndOffset": 91,
"PartOfSpeech": {
"Score": 0.7294206023216248,
"Tag": "VERB"
},
"Text": "learning",
"TokenId": 15
},
{
"BeginOffset": 92,
"EndOffset": 94,
"PartOfSpeech": {
"Score": 0.9947968125343323,
"Tag": "PART"
},
"Text": "to",
"TokenId": 16
},
{
"BeginOffset": 95,
"EndOffset": 99,
"PartOfSpeech": {
"Score": 0.9998737573623657,
"Tag": "VERB"
},
"Text": "find",
"TokenId": 17
},
{
"BeginOffset": 100,
"EndOffset": 108,
"PartOfSpeech": {
"Score": 0.9998371601104736,
"Tag": "NOUN"
},
"Text": "insights",
"TokenId": 18
},
{
"BeginOffset": 109,
"EndOffset": 112,
"PartOfSpeech": {
"Score": 0.9999772310256958,
"Tag": "CONJ"
},
"Text": "and",
"TokenId": 19
},
{
"BeginOffset": 113,
"EndOffset": 126,
"PartOfSpeech": {
"Score": 0.9998776912689209,
"Tag": "NOUN"
},
"Text": "relationships",
"TokenId": 20
},
{
"BeginOffset": 127,
"EndOffset": 129,
"PartOfSpeech": {
"Score": 0.9999299049377441,
"Tag": "ADP"
},
"Text": "in",
"TokenId": 21
},
{
"BeginOffset": 130,
"EndOffset": 134,
"PartOfSpeech": {
"Score": 0.9992431402206421,
"Tag": "NOUN"
},
"Text": "text",
"TokenId": 22
},
{
"BeginOffset": 134,
"EndOffset": 135,
"PartOfSpeech": {
"Score": 0.9999969005584717,
"Tag": "PUNCT"
},
"Text": ".",
"TokenId": 23
}
]
}

You can see that the text is tokenized and labeled a parts of speech, for instance, noun and verb. You can also confirm the confidence score.

The part of speech attached to the tag are summarized beßlow.

Token Part of speech
ADJ Adjective
ADP Adposition
ADV Adverb
AUX Auxiliary
CONJ Coordinating conjunction
DET Determiner
INTJ Interjection
NOUN Noun
NUM Numeral
O Other
PART Particle
PRON Pronoun
PROPN Proper noun
PUNCT Punctuation
SCONJ Subordinating conjunction
SYM Symbol
VERB Verb

Please refer to this documentation for details.

Syntax

Conclusion

Amazon Comprehend's Syntax Analysis can tokenize text and label each word with parts of speech and analyze it.

In this blog, we illustrated syntax analysis using Amazon Comprehend Syntax API with AWS SDK for Python (Boto3).

Please refer to the blog below about other features of Amazon Comprehend, Keyphrase Extraction,Sentiment Analysis,Entity Recognition,Language Detection, and Topic Modeling.

How to use Amazon Comprehend operations using the AWS SDK for Python (Boto3)

Reference