Can Google Assistant interact with apps while they're open?



  • I would like to know whether Google Assistant can interact with apps while they're open, and whether you could kindly provide me with since examples.

    The rationale for my question follows:

    I was wondering whether Android supports making Google Assistant interact with apps while they're open. I know Google Assistant interfaces to apps which it can open by calling them with parameters. For example if I tell it "call Joe on WhatsApp" it opens WhatsApp and then immediately WhatsApp calls Joe. However, while WhatsApp is open, I can open Google Assistant to overlay, but then I cannot tell it, after the call has been answered, "switch call to video". This would be useful.

    What I would like to know, is whether this functionality is available in the Google Assistant API, and, if so, why WhatsApp hasn't implemented the interface. If not, it could be quite useful, because that way a blind person could easily make a call on someone's phone, even someone else's phone, found in the house, and if a child next to the blind person wanted to call their mother and switch to video after the call and the child was too small then the blind person, or person who accidentally lost their vision temporarily due to an accident, health condition, or other, could easily do that. Then the mother could intervene and do what was needed to offer her presence, and, if needed, her help. I feel that this would be an important accessibility feature that Google and WhatsApp could implement and add support for together. The phone the blind person normally uses with TalkBack could break or run out of battery. Or it may be that another adult in the house in charge faints and the blind person realizes this and needs to take over and needs to use the other person's phone as trained and as fast as possible, and may not want to have to go through activating TalkBack which may not have been configured on the other phone or whose speech may confuse the child at hand, making them think the phone is doing things they do not want the phone to do, which could cause the child to grab the phone from the blind person and remain helpless, unable to make the call or do what they need to do on the phone.

    I wonder whether Google will implement the ability for users to interact with apps via Google Assistant while those apps are open, rather than just using Google Assistant for opening those apps, in the near future.

    Other apps could also benefit from interfacing with Google Assistant in this way in countless manners. For instance a music app could let the user change the music played from the assistant. A camera app could allow the user to take a picture by simply telling the assistant after it was opened. YouTube could play the cartoon the assistant told it to after the user opened YouTube from the assistant. These are all perfectly reasonable and extremely useful functions which are significant improvements from a usability perspective for a blind user who has to deal with an impatient child who wants to see things on the phone and can't write because of being too young while the other parent is away from home at work.

    Thanks.



  • Can Google Assistant interact with apps while they're open?

    Yes and No depending what you mean by Google Assistant and app

    Google Assistant covers a number of features but for the purposes of this answer will cover two specific sets, with unfortunately similar names:

    • Google Actions - https://developers.google.com/assistant/conversational written by third-party developers, similar to Amazon Alexa Skills, where you have known path voice conversations with an 'app' in the cloud. While their use is promoted with the home interactive speaker devices, you can use these on Android devices. Other than the Google app itself, no third-party Android app is involved, but the same backend service for that third-party app is probably used. Realize that once you are in the 'voice app' that app is using the voice infrastructure of Google to transform speech to text, and even then only a limited number of synonyms which is then used by the 'voice app'.

    • Google App Actions / App Shortcuts - https://developers.google.com/assistant/app/reference/built-in-intents which are sent to a supporting Android app. Note that Android Intents are usually for imperative action which the app will then handle fully.

    See: https://developer.android.com/guide/topics/ui/shortcuts#shortcut-capabilities

    Once the Android Intent is fired, the receiving app is brought to the front and historically the foreground app takes priority except for media playing, notifications or incoming calls. Only with Android 10 and onward it is possible to officially support https://developer.android.com/guide/topics/media/sharing-audio-input which of course brings its own privacy/security concerns.

    TalkBack is a Google Accessibility app separate from Google Assistant which provides for sight assistive navigation/user input.

    So how does TalkBack work?

    Android when released supported D-pad navigation and the concept of focus for each individual visual element used by a typical app developer not including games. Adding additional metadata about each visual field doesn't normally require a major code change, see https://developer.android.com/guide/topics/ui/accessibility/principles . Using Android's accessibility APIs is how TalkBack to provide for sight assistive navigation/user input.

    So can't I speak to the third-party Android app to make it do things in a natural way?

    What you describe requires deeper integration with TalkBack. But TalkBack isn't necessarily the only accessibility support app available. Any APIs that it uses should exist in the Android framework. Alternatively the third-party Android app itself can integrate speech-to-text features.

    Why does the third-party voice app, or Google Action, sound more natural/understandable than TalkBack?

    Those services are designed voice first during development and use Speech Synthesis Markup Language (SSML) to add in prosody necessary to make the output sound more natural.

    Android does have its own API called https://developer.android.com/reference/android/text/style/TtsSpan?hl=en for adding some prosody elements to spoken output. However it doesn't support SSML directly: https://stackoverflow.com/q/62436229/295004 , which means more effort from developers who do want to support more natural speech output.

    So what does that mean?

    If a user says "add 5 to number field" context is important to know what does "add 5" and which "number field" is being referred to. Voice first apps would be designed to respond with "which number field, x or y" or reset the conversation so that the user and app are synced up. Android apps are designed primarily to be visual with cues to make the input conform with expectation (allowed letters, numbers, length, etc.) with accessibility or voice as an extra supported feature but not primary one.

    TalkBack or any third-party Accessibility app has only the limited knowledge expressed by any accessibility tags for the fields that are on that one screen and has no knowledge of any other screen. No Android framework API exists for a Android app developer to say to the Accessibility app "if the user says xyz do foobar".

    So what about integrating speech-to-text directly into the Android app

    Speech-to-Text is currently a paid service for third-party apps and done in the cloud. While on device speech-to-text is possible, app developers may just depend on Custom Keyboard (GBoard, OEM keyboard, others) to handle that aspect. For example, the search field in the YouTube for Android app by tapping the microphone icon allows you to speak your search topic. The Facebook for Android app doesn't have such a way in the app and depends upon the the Keyboard App to feed it text. Do realize that until recently, good quality voice recognition required sending your voice to the cloud for parsing.

    But if in Android 10 and up now allows for shared microphone access then further feature integration is possible.

    Yes, but would depend on the developer of the Accessibility App (Google or other party) and the hundreds of thousands of the third-party app developers to support any API, and to my knowledge there is no API for that in the Android framework.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2