.TH "MPSNNGraph" 3 "Mon Jul 9 2018" "Version MetalPerformanceShaders-119.3" "MetalPerformanceShaders.framework" \" -*- nroff -*- .ad l .nh .SH NAME MPSNNGraph .SH SYNOPSIS .br .PP .PP \fC#import \fP .PP Inherits \fBMPSKernel\fP, , and \&. .SS "Instance Methods" .in +1c .ti -1c .RI "(nullable instancetype) \- \fBinitWithDevice:resultImage:resultImageIsNeeded:\fP" .br .ti -1c .RI "(nullable instancetype) \- \fBinitWithDevice:resultImage:\fP" .br .ti -1c .RI "(nullable instancetype) \- \fBinitWithCoder:device:\fP" .br .ti -1c .RI "(nonnull instancetype) \- \fBinitWithDevice:\fP" .br .ti -1c .RI "(void) \- \fBreloadFromDataSources\fP" .br .ti -1c .RI "(\fBMPSImage\fP *__nullable) \- \fBencodeToCommandBuffer:sourceImages:sourceStates:intermediateImages:destinationStates:\fP" .br .ti -1c .RI "(\fBMPSImageBatch\fP *__nullable) \- \fBencodeBatchToCommandBuffer:sourceImages:sourceStates:intermediateImages:destinationStates:\fP" .br .ti -1c .RI "(\fBMPSImage\fP *__nullable) \- \fBencodeToCommandBuffer:sourceImages:\fP" .br .ti -1c .RI "(\fBMPSImageBatch\fP *__nullable) \- \fBencodeBatchToCommandBuffer:sourceImages:sourceStates:\fP" .br .ti -1c .RI "(\fBMPSImage\fP *__nonnull) \- \fBexecuteAsyncWithSourceImages:completionHandler:\fP" .br .in -1c .SS "Class Methods" .in +1c .ti -1c .RI "(nullable instancetype) + \fBgraphWithDevice:resultImage:resultImageIsNeeded:\fP" .br .ti -1c .RI "(nullable instancetype) + \fBgraphWithDevice:resultImage:\fP" .br .in -1c .SS "Properties" .in +1c .ti -1c .RI "NSArray< id< \fBMPSHandle\fP > > * \fBsourceImageHandles\fP" .br .ti -1c .RI "NSArray< id< \fBMPSHandle\fP > > * \fBsourceStateHandles\fP" .br .ti -1c .RI "NSArray< id< \fBMPSHandle\fP > > * \fBintermediateImageHandles\fP" .br .ti -1c .RI "NSArray< id< \fBMPSHandle\fP > > * \fBresultStateHandles\fP" .br .ti -1c .RI "id< \fBMPSHandle\fP > \fBresultHandle\fP" .br .ti -1c .RI "BOOL \fBoutputStateIsTemporary\fP" .br .ti -1c .RI "id< MPSImageAllocator > \fBdestinationImageAllocator\fP" .br .ti -1c .RI "\fBMPSImageFeatureChannelFormat\fP \fBformat\fP" .br .ti -1c .RI "BOOL \fBresultImageIsNeeded\fP" .br .in -1c .SS "Additional Inherited Members" .SH "Detailed Description" .PP Optimized representation of a graph of MPSNNImageNodes and MPSNNFilterNodes Once you have prepared a graph of MPSNNImageNodes and MPSNNFilterNodes (and if needed MPSNNStateNodes), you may initialize a \fBMPSNNGraph\fP using the \fBMPSNNImageNode\fP that you wish to appear as the result\&. The \fBMPSNNGraph\fP object will introspect the graph representation and determine which nodes are needed for inputs, and which nodes are produced as output state (if any)\&. Nodes which are not needed to calculate the result image node are ignored\&. Some nodes may be internally concatenated with other nodes for better performance\&. .PP Note: the \fBMPSNNImageNode\fP that you choose as the result node may be interior to a graph\&. This feature is provided as a means to examine intermediate computations in the full graph for debugging purposes\&. .PP During \fBMPSNNGraph\fP construction, the graph attached to the result node will be parsed and reduced to an optimized representation\&. This representation may be saved using the \fBNSSecureCoding\fP protocol for later recall\&. .PP When decoding a \fBMPSNNGraph\fP using a NSCoder, it will be created against the system default MTLDevice\&. If you would like to set the MTLDevice, your NSCoder should conform to the protocol\&. .PP You may find it helpful to set MPSKernelOptionsVerbose on the graph when debugging\&. To turn this on during \fBMPSKernel\fP initialization (including \fBMPSNNGraph\fP initialization) set the MPS_LOG_INFO environment variable\&. There is a lot of information about what optimizations are done to your graph, including some information on why certain optimizations were not made\&. .SH "Method Documentation" .PP .SS "\- (\fBMPSImageBatch\fP * __nullable) encodeBatchToCommandBuffer: (nonnull id< MTLCommandBuffer >) commandBuffer(NSArray< \fBMPSImageBatch\fP * > *__nonnull) sourceImages(NSArray< \fBMPSStateBatch\fP * > *__nullable) sourceStates" Convenience method to encode a batch of images .SS "\- (\fBMPSImageBatch\fP * __nullable) encodeBatchToCommandBuffer: (__nonnull id< MTLCommandBuffer >) commandBuffer(NSArray< \fBMPSImageBatch\fP * > *__nonnull) sourceImages(NSArray< \fBMPSStateBatch\fP * > *__nullable) sourceStates(NSMutableArray< \fBMPSImageBatch\fP * > *__nullable) intermediateImages(NSMutableArray< \fBMPSStateBatch\fP * > *__nullable) destinationStates" Encode the graph to a MTLCommandBuffer This interface is like the other except that it operates on a batch of images all at once\&. In addition, you may specify whether the result is needed\&. .PP \fBParameters:\fP .RS 4 \fIcommandBuffer\fP The command buffer .br \fIsourceImages\fP \fBA\fP list of MPSImages to use as the source images for the graph\&. These should be in the same order as the list returned from \fBMPSNNGraph\&.sourceImageHandles\fP\&. The images may be image arrays\&. Typically, this is only one or two images such as a \&.JPG decoded into a MPSImage*\&. If the sourceImages are MPSTemporaryImages, the graph will decrement the readCount by 1, even if the graph actually reads an image multiple times\&. .br \fIsourceStates\fP \fBA\fP list of \fBMPSState\fP objects to use as state for a graph\&. These should be in the same order as the list returned from \fBMPSNNGraph\&.sourceStateHandles\fP\&. May be nil, if there is no source state\&. If the sourceStates are temporary, the graph will decrement the readCount by 1, even if the graph actually reads the state multiple times\&. .br \fIintermediateImages\fP An optional NSMutableArray to receive any \fBMPSImage\fP objects exported as part of its operation\&. These are only the images that were tagged with \fBMPSNNImageNode\&.exportFromGraph\fP = YES\&. The identity of the states is given by -resultStateHandles\&. If temporary, each intermediateImage will have a readCount of 1\&. If the result was tagged exportFromGraph = YES, it will be here too, with a readCount of 2\&. .br \fIdestinationStates\fP An optional NSMutableArray to receive any \fBMPSState\fP objects created as part of its operation\&. The identity of the states is given by -resultStateHandles\&. .RE .PP \fBReturns:\fP .RS 4 \fBA\fP MPSImageBatch or MPSTemporaryImageBatch allocated per the destinationImageAllocator containing the output of the graph\&. It will be automatically released when commandBuffer completes\&. If resultIsNeeded == NO, then this will return nil\&. .RE .PP .SS "\- (\fBMPSImage\fP * __nullable) encodeToCommandBuffer: (nonnull id< MTLCommandBuffer >) commandBuffer(NSArray< \fBMPSImage\fP * > *__nonnull) sourceImages" Encode the graph to a MTLCommandBuffer .PP IMPORTANT: Please use [MTLCommandBuffer addCompletedHandler:] to determine when this work is done\&. Use CPU time that would have been spent waiting for the GPU to encode the next command buffer and commit it too\&. That way, the work for the next command buffer is ready to go the moment the GPU is done\&. This will keep the GPU busy and running at top speed\&. .PP Those who ignore this advice and use [MTLCommandBuffer waitUntilCompleted] instead will likely cause their code to slow down by a factor of two or more\&. The CPU clock spins down while it waits for the GPU\&. When the GPU completes, the CPU runs slowly for a while until it spins up\&. The GPU has to wait for the CPU to encode more work (at low clock), giving it plenty of time to spin its own clock down\&. In typical CNN graph usage, neither may ever reach maximum clock frequency, causing slow down far beyond what otherwise would be expected from simple failure to schedule CPU and GPU work concurrently\&. Regrattably, it is probable that every performance benchmark you see on the net will be based on [MTLCommandBuffer waitUntilCompleted]\&. .PP \fBParameters:\fP .RS 4 \fIcommandBuffer\fP The command buffer .br \fIsourceImages\fP \fBA\fP list of MPSImages to use as the source images for the graph\&. These should be in the same order as the list returned from \fBMPSNNGraph\&.sourceImageHandles\fP\&. .RE .PP \fBReturns:\fP .RS 4 \fBA\fP \fBMPSImage\fP or \fBMPSTemporaryImage\fP allocated per the destinationImageAllocator containing the output of the graph\&. It will be automatically released when commandBuffer completes\&. It can be nil if resultImageIsNeeded == NO .RE .PP .SS "\- (\fBMPSImage\fP * __nullable) encodeToCommandBuffer: (nonnull id< MTLCommandBuffer >) commandBuffer(NSArray< \fBMPSImage\fP * > *__nonnull) sourceImages(NSArray< \fBMPSState\fP * > *__nullable) sourceStates(NSMutableArray< \fBMPSImage\fP * > *__nullable) intermediateImages(NSMutableArray< \fBMPSState\fP * > *__nullable) destinationStates" Encode the graph to a MTLCommandBuffer .PP \fBParameters:\fP .RS 4 \fIcommandBuffer\fP The command buffer .br \fIsourceImages\fP \fBA\fP list of MPSImages to use as the source images for the graph\&. These should be in the same order as the list returned from \fBMPSNNGraph\&.sourceImageHandles\fP\&. The images may be image arrays\&. Typically, this is only one or two images such as a \&.JPG decoded into a MPSImage*\&. If the sourceImages are MPSTemporaryImages, the graph will decrement the readCount by 1, even if the graph actually reads an image multiple times\&. .br \fIsourceStates\fP \fBA\fP list of \fBMPSState\fP objects to use as state for a graph\&. These should be in the same order as the list returned from \fBMPSNNGraph\&.sourceStateHandles\fP\&. May be nil, if there is no source state\&. If the sourceStates are temporary, the graph will decrement the readCount by 1, even if the graph actually reads the state multiple times\&. .br \fIintermediateImages\fP An optional NSMutableArray to receive any \fBMPSImage\fP objects exported as part of its operation\&. These are only the images that were tagged with \fBMPSNNImageNode\&.exportFromGraph\fP = YES\&. The identity of the states is given by -resultStateHandles\&. If temporary, each intermediateImage will have a readCount of 1\&. If the result was tagged exportFromGraph = YES, it will be here too, with a readCount of 2\&. .br \fIdestinationStates\fP An optional NSMutableArray to receive any \fBMPSState\fP objects created as part of its operation\&. The identity of the states is given by -resultStateHandles\&. .RE .PP \fBReturns:\fP .RS 4 \fBA\fP \fBMPSImage\fP or \fBMPSTemporaryImage\fP allocated per the destinationImageAllocator containing the output of the graph\&. It will be automatically released when commandBuffer completes\&. .RE .PP .SS "\- (\fBMPSImage\fP * __nonnull) executeAsyncWithSourceImages: (NSArray< \fBMPSImage\fP * > *__nonnull) sourceImages(\fBMPSNNGraphCompletionHandler\fP __nonnull) handler" Convenience method to execute a graph without having to manage many Metal details This function will synchronously encode the graph on a private command buffer, commit it to a MPS internal command queue and return\&. The GPU will start working\&. When the GPU is done, the completion handler will be called\&. You should use the intervening time to encode other work for execution on the GPU, so that the GPU stays busy and doesn't clock down\&. .PP The work will be performed on the MTLDevice that hosts the source images\&. .PP This is a convenience API\&. There are a few situations it does not handle optimally\&. These may be better handled using [encodeToCommandBuffer:sourceImages:]\&. Specifically: .PP .nf o If the graph needs to be run multiple times for different images, it would be better to encode the graph multiple times on the same command buffer using [encodeToCommandBuffer:sourceImages:] This will allow the multiple graphs to share memory for intermediate storage, dramatically reducing memory usage\&. o If preprocessing or post-processing of the MPSImage is required, such as resizing or normalization outside of a convolution, it would be better to encode those things on the same command buffer\&. Memory may be saved here too for intermediate storage\&. (MPSTemporaryImage lifetime does not span multiple command buffers\&.) .fi .PP .PP \fBParameters:\fP .RS 4 \fIsourceImages\fP \fBA\fP list of MPSImages to use as the source images for the graph\&. These should be in the same order as the list returned from \fBMPSNNGraph\&.sourceImageHandles\fP\&. They should be allocated against the same MTLDevice\&. There must be at least one source image\&. Note: this array is intended to handle the case where multiple input images are required to generate a single graph result\&. That is, the graph itself has multiple inputs\&. If you need to execute the graph multiple times, then call this API multiple times, or better yet use [encodeToCommandBuffer:sourceImages:] multiple times\&. (See discussion) .br \fIhandler\fP \fBA\fP block to receive any errors generated\&. This block may run on any thread and may be called before this method returns\&. The image, if any, passed to this callback is the same image as that returned from the left hand side\&. .RE .PP \fBReturns:\fP .RS 4 \fBA\fP \fBMPSImage\fP to receive the result\&. The data in the image will not be valid until the completionHandler is called\&. .RE .PP .SS "+ (nullable instancetype) graphWithDevice: (nonnull id< MTLDevice >) device(\fBMPSNNImageNode\fP *__nonnull) resultImage" .SS "+ (nullable instancetype) graphWithDevice: (nonnull id< MTLDevice >) device(\fBMPSNNImageNode\fP *__nonnull) resultImage(BOOL) resultIsNeeded" .SS "\- (nullable instancetype) \fBinitWithCoder:\fP (NSCoder *__nonnull) aDecoder(nonnull id< MTLDevice >) device" \fBNSSecureCoding\fP compatability While the standard NSSecureCoding/NSCoding method -initWithCoder: should work, since the file can't know which device your data is allocated on, we have to guess and may guess incorrectly\&. To avoid that problem, use initWithCoder:device instead\&. .PP \fBParameters:\fP .RS 4 \fIaDecoder\fP The NSCoder subclass with your serialized \fBMPSKernel\fP .br \fIdevice\fP The MTLDevice on which to make the \fBMPSKernel\fP .RE .PP \fBReturns:\fP .RS 4 \fBA\fP new \fBMPSKernel\fP object, or nil if failure\&. .RE .PP .PP Reimplemented from \fBMPSKernel\fP\&. .SS "\- (nonnull instancetype) initWithDevice: (__nonnull id< MTLDevice >) device" Use initWithDevice:resultImage: instead .SS "\- (nullable instancetype) \fBinitWithDevice:\fP (nonnull id< MTLDevice >) device(\fBMPSNNImageNode\fP *__nonnull) resultImage" .SS "\- (nullable instancetype) \fBinitWithDevice:\fP (nonnull id< MTLDevice >) device(\fBMPSNNImageNode\fP *__nonnull) resultImage(BOOL) resultIsNeeded" Initialize a \fBMPSNNGraph\fP object on a device starting with resultImage working backward The \fBMPSNNGraph\fP constructor will start with the indicated result image, and look to see what \fBMPSNNFilterNode\fP produced it, then look to its dependencies and so forth to reveal the subsection of the graph necessary to compute the image\&. .PP \fBParameters:\fP .RS 4 \fIdevice\fP The MTLDevice on which to run the graph .br \fIresultImage\fP The \fBMPSNNImageNode\fP corresponding to the last image in the graph\&. This is the image that will be returned\&. Note: the imageAllocator for this node is ignored and the \fBMPSNNGraph\&.destinationImageAllocator\fP is used for this node instead\&. .br \fIresultIsNeeded\fP Commonly, when training a graph, the last \fBMPSImage\fP out of the graph is not used\&. The final gradient filter is run solely to update some weights\&. If resultIsNeeded is set to NO, nil will be returned from the left hand side of the -encode call instead, and computation to produce the last image may be pruned away\&. .RE .PP \fBReturns:\fP .RS 4 \fBA\fP new \fBMPSNNGraph\fP\&. .RE .PP .SS "\- (void) reloadFromDataSources " Reinitialize all graph nodes from data sources \fBA\fP number of the nodes that make up a graph have a data source associated with them, for example a \fBMPSCNNConvolutionDataSource\fP or a \fBMPSCNNBatchNormalizationDataSource\fP\&. Generally, the data is read from these once at graph initialization time and then not looked at again, except during the weight / parameter update phase of the corresponding gradient nodes and then only if CPU updates are requested\&. Otherwise, update occurs on the GPU, and the data in the data source is thereafter ignored\&. .PP It can happen, though, that your application has determined the graph should load a new set of weights from the data source\&. When this method is called, the graph will find all nodes that support reloading and direct them to reinitialize themselves based on their data source\&. .PP This process occurs immediately\&. Your application will need to make sure any GPU work being done by the graph is complete to ensure data coherency\&. Most nodes do not have a data source and will not be modified\&. Nodes that are not used by the graph will not be updated\&. .SH "Property Documentation" .PP .SS "\- (id) destinationImageAllocator\fC [read]\fP, \fC [write]\fP, \fC [nonatomic]\fP, \fC [retain]\fP" Method to allocate the result image from -encodeToCommandBuffer\&.\&.\&. This property overrides the allocator for the final result image in the graph\&. Default: \fBdefaultAllocator (MPSImage)\fP .SS "\- (\fBMPSImageFeatureChannelFormat\fP) format\fC [read]\fP, \fC [write]\fP, \fC [nonatomic]\fP, \fC [assign]\fP" The default storage format used for graph intermediate images This doesn't affect how data is stored in buffers in states\&. Nor does it affect the storage format for weights such as convolution weights stored by individual filters\&. Default: MPSImageFeatureChannelFormatFloat16 .SS "\- (NSArray >*) intermediateImageHandles\fC [read]\fP, \fC [nonatomic]\fP, \fC [copy]\fP" Get a list of identifiers for intermediate images objects produced by the graph .SS "\- (BOOL) outputStateIsTemporary\fC [read]\fP, \fC [write]\fP, \fC [nonatomic]\fP, \fC [assign]\fP" Should \fBMPSState\fP objects produced by -encodeToCommandBuffer\&.\&.\&. be temporary objects\&. See \fBMPSState\fP description\&. Default: NO .SS "\- (id<\fBMPSHandle\fP>) resultHandle\fC [read]\fP, \fC [nonatomic]\fP, \fC [assign]\fP" Get a handle for the graph result image .SS "\- (BOOL) resultImageIsNeeded\fC [read]\fP, \fC [nonatomic]\fP, \fC [assign]\fP" Set at -init time\&. If NO, nil will be returned from -encode calls and some computation may be omitted\&. .SS "\- (NSArray >*) resultStateHandles\fC [read]\fP, \fC [nonatomic]\fP, \fC [copy]\fP" Get a list of identifiers for result state objects produced by the graph Not guaranteed to be in the same order as sourceStateHandles .SS "\- (NSArray >*) sourceImageHandles\fC [read]\fP, \fC [nonatomic]\fP, \fC [copy]\fP" Get a list of identifiers for source images needed to calculate the result image .SS "\- (NSArray >*) sourceStateHandles\fC [read]\fP, \fC [nonatomic]\fP, \fC [copy]\fP" Get a list of identifiers for source state objects needed to calculate the result image Not guaranteed to be in the same order as resultStateHandles .SH "Author" .PP Generated automatically by Doxygen for MetalPerformanceShaders\&.framework from the source code\&.