I'm querying an Insights endpoint with the Facebook Python SDK and have a hard time making the response I get work with Python and, subsequently pandas. I do the following call:
account = AdAccount('act_id')
params = {
'fields': [Insights.Field.impressions,Insights.Field.clicks,Insights.Field.actions,Insights.Field.spend],
'breakdowns': [Insights.Breakdown.hourly_stats_aggregated_by_advertiser_time_zone],
'time_range': {
'since': 'start',
'until': 'end',
},
'action_attribution_windows': ['7d_click'],
}
result = account.get_insights(params=params)
print (result)
which returns data like so:
[<Insights> {
"actions": [
{
"7d_click": 600,
"action_type": "custom_event_xyz",
"value": 50
},
{
"7d_click": 600,
....
}
],
"clicks": 1500,
"date_start": "start",
"date_stop": "end",
"hourly_stats_aggregated_by_advertiser_time_zone": "00:00:00 - 00:59:59",
"impressions": "60000",
"spend": 60
}, <Insights> {
....
]
While putting the data excluding the actionsdata into a pandas DataFrame, I do not manage to properly flatten the actions data so that the aggregation level is consistent (i.e. the "actions" keys as column headers). Having checked online and also on Stackoverflow, loading the json with python and processing it accordingly then as well as reading it with pandas are both options that do not work.
Summing up I don't see how I can elegantly mine the deeper nested parts of the response and easily make the contents compatible with the rest.
you just met the vary same problem as i did. The get_insights returns data is like json, but it is not. You can know what type it is just use type(result). It is facebookads.objects.EdgeIterator. You can change it's type by using result=[x for x in result],now the type of the 'result' is list!!! And then you can use pandas to do whatever you want, pandas.DataFrame(result).
You can also convert it to a dict item.
for item in result:
data = dict(item)
print(data)
Now you can parse the dict object like key value pairs or even convert to json.
you should use append to save all the information in one variable:
results = []
for item in report:
data = dict(item)
results.append(data)
print(results)
While this question asks specifically about a Pandas DataFrame, you may have found this question in a Google Search when trying to work with Facebook API data in PySpark/DataBricks DataFrame (I did).
For PySpark, you need just 1 extra step: your list needs to consist of PySpark Row
s instead of regular python dict
s.
Here's a convenience function that will transform you FB API response into a Spark-Friendly form.
from pyspark.sql import Row
def facebook_result_to_rows(facebook_api_response):
return (Row(**dict(x)) for x in facebook_api_response)
And an example of how I used it:
insights = fb_ad_account.get_insights()
# insights is a python "list" containing many "AdsInsights" objects whose data we want. try print(insights) to see what I mean.
# Those objects are instances of the class AdsInsights found in facebook_business.adobjects.adsinsights. Other API calls have similar response objects.
# Convert said objects into something PySpark can actually work with
frame_data = facebook_result_to_rows(insights)
# Make a dataframe from our transformed data
pyspark_dataframe = spark.createDataFrame(frame_data)
Now you have a pyspark_dataframe
with which you can do all those native pyspark things like .saveAsTable()
, unionByName()
, join, filter, etc..