Archives: big data

bigdata OLTP , OLAP No ratings yet.

row-based vs col based db or format row based –> good for OLTP ( transcation),   e.g: cassendra col based –> good for OLAP (? easy to aggreation etc?), druid Parquet https://www.jumpingrivers.com/blog/parquet-file-format-big-data-r/ hadoop: big data storage, what is the alternatives? S3 on cloud? https://www.alluxio.io/learn/hdfs/basic-file-operations-commands/ https://stackoverflow.com/questions/31011078/data-retention-in-hadoop-hdfs   pinot vs cassandra druid https://imply.io/post/apache-cassandra-vs-apache-druid If your queries ALWAYS constrain • Read More »


bigdata platform with Kubernets or Hadoop No ratings yet.

Hadoop: Hadoop kubernets MapReduce Spark on K8s Flink stream HDFS S3? any better one Resource manager Yarn/Mesos K8s itself   During its evolution phase, Hadoop provided three main functionalities that made it a Big Data-ready solution: a distributed computer mechanism (MapReduce), a robust data storage (HDFS), and a resource manager (YARN/Mesos). But modern technologies now • Read More »



Apache Kafka big picture and quick start No ratings yet.

What is Apache Kafka? ( big picture)  I found the article http://www.confluent.io/blog/stream-data-platform-1/ ( from Jay Kreps) presented a very good big picture on what Kafka suppose to do: you can use Kafka to build a stream data platform. Here the pictures from that article. The big idea is simple: many business processes can be modeled • Read More »